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SYNTHETIC NUCLEIC ACID MOLECULE COMPOSITIONS AND 
METHODS OF PREPARATION 

Statement of Government Rights 

5 The invention was made at least in part with a grant from the 

Government of the United States of America (grant DMI-9402762 from the 
National Science Foundation). The Government has certain rights to the 
invention. 

10 Background of the Invention 

Transcription, the synthesis of an RNA molecule from a sequence of 
DNA is the first step in gene expression. Sequences which regulate DNA 
transcription include promoter sequences, polyadenylation signals, transcription 
factor binding sites and enhancer elements. A promoter is a DNA sequence 

1 5 capable of specific initiation of transcription and consists of three general 

regions. The core promoter is the sequence where the RNA polymerase and its 
cofactors bind to the DNA. Immediately upstream of the core promoter is the 
proximal promoter which contains several transcription factor binding sites that 
are responsible for the assembly of an activation complex that in turn recruits the 

20 polymerase complex. The distal promoter, located further upstream of the 

proximal promoter also contains transcription factor binding sites. Transcription 
termination and polyadenylation, like transcription initiation, are site specific 
and encoded by defined sequences. Enhancers are regulatory regions, containing 
multiple transcription factor binding sites, that can significantly increase the 

25 level of transcription from a responsive promoter regardless of the enhancer's 
orientation and distance with respect to the promoter as long as the enhancer and 
promoter are located within the same DNA molecule. The amount of transcript 
produced from a gene may also be regulated by a post-transcriptional 
mechanism, the most important being RNA splicing that removes intervening 

30 sequences (introns) from a primary transcript between splice donor and splice 
acceptor sequences. 
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Natural selection is the hypothesis that genotype-environment 
interactions occurring at the phenotypic level lead to differential reproductive 
success of individuals and therefore to modification of the gene pool of a 
population. 

5 Some properties of nucleic acid molecules that are acted upon by natural 
selection include codon usage frequency, RNA secondary structure, the 
efficiency of intron splicing, and interactions with transcription factors or other 
nucleic acid binding proteins. Because of the degenerate nature of the genetic 
code, these properties can be optimized by natural selection without altering the 

1 0 corresponding amino acid sequence. 

Under some conditions, it is useful to synthetically alter the natural 
nucleotide sequence encoding a polypeptide to better adapt the polypeptide for 
alternative applications. A common example is to alter the codon usage 
frequency of a gene when it is expressed in a foreign host cell. Although 

15 redundancy in the genetic code allows amino acids to be encoded by multiple 
codons, different organisms favor some codons over others. It has been found 
that the efficiency of protein translation in a non-native host cell can be 
substantially increased by adjusting the codon usage frequency but maintaining 
the same gene product (U.S. Patent Nos. 5,096,825, 5,670,356, and 5,874,304). 

20 However, altering codon usage may, in turn, result in the unintentional 

introduction into a synthetic nucleic acid molecule of inappropriate transcription 
regulatory sequences. This may adversely effect transcription, resulting in 
anomalous expression of the synthetic DNA. Anomalous expression is defined 
as departure from normal or expected levels of expression. For example, 

25 transcription factor binding sites located downstream from a promoter have been 
demonstrated to effect promoter activity (Michael et al, 1990; Lamb et al., 1998; 
Johnson et al., 1998; Jones et al., 1997). Additionally, it is not uncommon for 
an enhancer element to exert activity and result in elevated levels of DNA 
transcription in the absence of a promoter sequence or for the presence of 

30 transcription regulatory sequences to increase the basal levels of gene expression 
in the absence of a promoter sequence. 
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Thus, what is needed is a method for making synthetic nucleic acid 
molecules with altered codon usage without also introducing inappropriate or 
unintended transcription regulatory sequences for expression in a particular host 
cell 

5 

Summary of the Invention 

The invention provides a synthetic nucleic acid molecule comprising at 
least 300 nucleotides of a coding region for a polypeptide, having a codon 
composition differing at more than 25% of the codons from a wild type nucleic 

1 0 acid sequence encoding a polypeptide, and having at least 3-fold fewer, 

preferably at least 5-fold fewer, transcription regulatory sequences than would 
result if the differing codons were randomly selected. Preferably, the synthetic 
nucleic acid molecule encodes a polypeptide that has an amino acid sequence 
that is at least 85%, preferably 90%, and most preferably 95% or 99% identical 

15 to the amino acid sequence of the naturally-occurring (native or wild type) 

polypeptide (protein) from which it is derived. Thus, it is recognized that some 
specific amino acid changes may also be desirable to alter a particular 
phenotypic characteristic of the polypeptide encoded by the synthetic nucleic 
acid molecule. Preferably, the amino acid sequence identity is over at least 100 

20 contiguous amino acid residues. In one embodiment of the invention, the codons 
in the synthetic nucleic acid molecule that differ preferably encode the same 
amino acids as the corresponding codons in the wild type nucleic acid sequence. 

The transcription regulatory sequences which are reduced in the synthetic 
nucleic acid molecule include, but are not limited to, any combination of 

25 transcription factor binding sequences, intron splice sites, poly(A) addition sites, 
enhancer sequences and promoter sequences. Transcription regulatory sequences 
are well known in the art. 

It is preferred that the synthetic nucleic acid molecule of the invention 
has a codon composition that differs from that of the wild type nucleic acid 

30 sequence at more than 30%, 35%, 40% or more than 45%, e.g., 50%, 55%, 60% 
or more of the codons. Preferred codons for use in the invention are those which 
are employed more frequently than at least one other codon for the same amino 
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acid in a particular organism and, more preferably, are also not low-usage 
codons in that organism and are not low-usage codons in the organism used to 
clone or screen for the expression of the synthetic nucleic acid molecule (for 
example, E. coli). Moreover, preferred codons for certain amino acids (i.e., 
5 those amino acids that have three or more codons,), may include two or more 
codons that are employed more frequently than the other (non-preferred) 
codon(s). The presence of codons in the synthetic nucleic acid molecule that are 
employed more frequently in one organism than in another organism results in a 
synthetic nucleic acid molecule which, when introduced into the cells of the 

10 organism that employs those codons more frequently, is expressed in those cells 
at a level that is greater than the expression of the wild type or parent nucleic 
acid sequence in those cells. For example, the synthetic nucleic acid molecule of 
the invention is expressed at a level that is at least about 1 10%, e.g., 150%, 
200%, 500% or more (1000%, 5000%, or 10000%) of that of the wild type 

15 nucleic acid sequence in a cell or cell extract under identical conditions (such as 
cell culture conditions, vector backbone, and the like). 

In one embodiment of the invention, the codons that are different are 
those employed more frequently in a mammal, while in another embodiment the 
codons that are different are those employed more frequently in a plant. A 

20 particular type of mammal, e.g., human, may have a different set of preferred 
codons than another type of mammal. Likewise, a particular type of plant may 
have a different set of preferred codons than another type of plant. In one 
embodiment of the invention, the majority of the codons which differ are ones 
that are preferred codons in a desired host cell. Preferred codons for mammals 

25 (e.g., humans) and plants are known to the art (e.g., Wada et al., 1990). For 
example, preferred human codons include, but are not limited to, CGC (Arg), 
CTG (Leu), TCT (Ser), AGC (Ser), ACC (Thr), CCA (Pro), CCT (Pro), GCC 
(Ala), GGC (Gly), GTG (Val), ATC (lie), ATT (lie), AAG (Lys), AAC (Asn), 
CAG (Gin), CAC (His), GAG (Glu), GAC (Asp), TAG (Tyr), TGC (Cys) and 

30 TTC (Phe) (Wada et al, 1990). Thus, preferred "humanized" synthetic nucleic 
acid molecules of the invention have a codon composition which differs from a 
wild type nucleic acid sequence by having an increased number of the preferred 
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human codons, e.g. CGC, CTG, TCT, AGC, ACC, CCA, CCT, GCC, GGC, 
GTG, ATC, ATT, AAG, AAC, CAG, CAC, GAG, GAC, TAC, TGC, TTC, or 
any combination thereof. For example, the synthetic nucleic acid molecule of 
the invention may have an increased number of CTG or TTG leucine-encoding 
5 codons, GTG or GTC valine-encoding codons, GGC or GGT glycine-encoding 
codons, ATC or ATT isoleucine-encoding codons, CCA or CCT proline- 
encoding codons, CGC or CGT arginine-encoding codons, AGC or TCT serine- 
encoding codons, ACC or ACT threonine-encoding codon, GCC or GCT 
alanine-encoding codons, or any combination thereof, relative to the wild type 

10 nucleic acid sequence. Similarly, synthetic nucleic acid molecules having an 
increased number of codons that are employed more frequently in plants, have a 
codon composition which differs from a wild type or parent nucleic acid 
sequence by having an increased number of the plant codons including, but not 
limited to, CGC (Arg), CTT (Leu), TCT (Ser), TCC (Ser), ACC (Thr), CCA 

15 (Pro), CCT (Pro), GCT (Ser), GGA (Gly), GTG (Val), ATC (lie), ATT (lie), 

AAG (Lys), AAC (Asn), CAA (Gin), CAC (His), GAG (Glu), GAC (Asp), TAC 
(Tyr), TGC (Cys), TTC (Phe), or any combination thereof (Murray et al., 1989). 
Preferred codons may differ for different types of plants (Wada et al, 1990). 
The choice of codon may be influenced by many factors such as, for 

20 example, the desire to have an increased number of nucleotide substitutions or 
decreased number of transcription regulatory sequences. Under some 
circumstances (e.g. to permit removal of a transcription factor binding site) it 
may be desirable to replace a non-preferred codon with a codon other than a 
preferred codon or a codon other than the most preferred codon. Under other 

25 circumstances, for example, to prepare codon distinct versions of a synthetic 
nucleic acid molecule, preferred codon pairs are selected based upon the largest 
number of mismatched bases, as well as the criteria described above. 

The presence of codons in the synthetic nucleic acid molecule that are 
employed more frequently in one organism than in another organism, results in a 

30 synthetic nucleic acid molecule which, when introduced into a cell of the 

organism that employs those codons, is expressed in that cell at a level which is 
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greater than the level of expression of the wild type or parent nucleic acid 
sequence. 

A synthetic nucleic acid molecule of the invention may encode a 
selectable marker protein or a reporter molecule. However, the invention 
5 applies to any gene and is not limited to synthetic reporter genes or synthetic 
selectable marker genes. In one embodiment of a synthetic nucleic acid 
molecule of the invention that is a reporter molecule, the synthetic nucleic acid 
molecule encodes a luciferase having a codon composition different than that of 
a wild type or parent Renilla luciferase or a beetle luciferase nucleic acid 

10 sequence. A synthetic click beetle luciferase nucleic acid molecule of the 

invention may optionally encode the amino acid valine at position 224 (i.e., it 
emits green light), or may optionally encode the amino acid histidine at position 
224, histidine at position 247, isoleucine at position 346, glutamine at position 
348 or combination thereof (i.e., it emits red light). Preferred synthetic 

15 luciferase nucleic acid molecules that are related to a wild type Renilla luciferase 
nucleic acid sequence include, but are not limited to, SEQ ID NO:21 (Rlucver2) 
or SEQ ID NO:22 (Rluc- final). Preferred synthetic luciferase nucleic acid 
molecules that are related to click beetle luciferase nucleic acid sequences 
include, but are not limited to, SEQ ID NO:7 (GRverS), SEQ ID NO:8 (GR6), 

20 SEQ ID NO:9 (GRverS.l), SEQ ID NO:14 (RDverS), SEQ ID NO:15 (RD7), 
SEQ ID NO:16 (RDverS.l), SEQ ID NO:17 (RDver5.2) or SEQ ID NO:18 
(RD156-1H9). 

The invention also provides an expression cassette. The expression 
cassette of the invention comprises a synthetic nucleic acid molecule of the 

25 invention operatively linked to a promoter that is functional in a cell. Preferred 
promoters are those functional in mammalian cells and those functional in plant 
cells. Optionally, the expression cassette may include other sequences, e.g., 
restriction enzyme recognition sequences and a Kozak sequence, and be a part of 
a larger polynucleotide molecule such as a plasmid, cosmid, artificial 

30 chromosome or vector, e.g., a viral vector. 

Also provided is a host cell comprising the synthetic nucleic acid 
molecule of the invention, an isolated polypeptide (e.g., a fusion polypeptide 
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encoded by the synthetic nucleic acid molecule of the invention), and 
compositions and kits comprising the synthetic nucleic acid molecule of the 
invention or the polypeptide encoded thereby in suitable container means and, 
optionally, instruction means. Preferred isolated polypeptides include, but are 
5 not limited to, those comprising SEQ ID NO:3 1 (GRver5 . 1), SEQ ID NO:226 
(Rluc-final), or SEQ ID NO:223 (RD156-1H9). 

The invention also provides a method to prepare a synthetic nucleic acid 
molecule of the invention by genetically altering a parent (either a wild type or 
another synthetic) nucleic acid sequence. The method may be used to prepare a 

10 synthetic nucleic acid molecule encoding a polypeptide comprising at least 100 
amino acids. One embodiment of the invention is directed to the preparation of 
synthetic genes encoding reporter or selectable marker proteins. The method of 
the invention may be employed to alter the codon usage frequency and decrease 
the number of transcription regulatory sequences in any open reading frame or to 

1 5 decrease the number of transcription regulatory sites in a vector backbone. 

Preferably, the codon usage frequency in the synthetic nucleic acid molecule is 
altered to reflect that of the host organism desired for expression of that nucleic 
acid molecule while also decreasing the number of potential transcription 
regulatory sequences relative to the parent nucleic acid molecule. 

20 Thus, the invention provides a method to prepare a synthetic nucleic acid 

molecule comprising an open reading frame. The method comprises altering 
(e.g., decreasing or eliminating) a plurality of transcription regulatory sequences 
in a parent (wild type or a synthetic) nucleic acid sequence that encodes a 
polypeptide having at least 100 amino acids to yield a synthetic nucleic acid 

25 molecule which has a decreased number of transcription regulatory sequences 
and which preferably encodes the same amino acids as the parent nucleic acid 
molecule. The transcription regulatory sequences are selected from the group 
consisting of transcription factor binding sequences, intron splice sites, poly(A) 
addition sites, enhancer sequences and promoter sequences, and the resulting 

30 synthetic nucleic acid molecule has at least 3-fold fewer, preferably 5-fold fewer, 
transcription regulatory sequences relative to the parent nucleic acid sequence. 
The method also comprises altering greater than 25% of the codons in the 
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synthetic nucleic acid sequence which has a decreased number of transcription 
regulatory sequences to yield a further synthetic nucleic acid molecule, wherein 
the codons that are altered encode the same amino acids as those in the 
corresponding position in the synthetic nucleic acid molecule which has a 
5 decreased number of transcription regulatory sequences and/or in the parent 
nucleic acid sequence. Preferably, the codons which are altered do not result in 
an increase in transcriptional regulatory sequences. Preferably, the further 
synthetic nucleic acid molecule encodes a polypeptide that has at least 85%, 
preferably 90%, and most preferably 95% or 99% contiguous amino acid 

10 sequence identity to the amino acid sequence of the polypeptide encoded by the 
parent nucleic acid sequence. 

Alternatively, the method comprises altering greater than 25% of the 
codons in a parent nucleic acid sequence which encodes a polypeptide having at 
least 100 amino acids to yield a codon-altered synthetic nucleic acid molecule, 

15 wherein the codons that are altered encode the same amino acids as those present 
in the corresponding positions in the parent nucleic acid sequence. Then, a 
plurality of transcription regulatory sequences in the codon-altered synthetic 
nucleic acid molecule are altered to yield a further synthetic nucleic acid 
molecule. Preferably, the codons which are altered do not result in an increase in 

20 transcriptional regulatory sequences. Also, preferably, the further synthetic 
nucleic acid molecule encodes a polypeptide that has at least 85%, preferably 
90%, and most preferably 95% or 99% contiguous amino acid sequence identity 
to the amino acid sequence of the polypeptide encoded by the parent nucleic acid 
sequence. Also provided is a synthetic (including a further synthetic) nucleic 

25 acid molecule prepared by the methods of the invention. 

As described hereinbelow, the methods of the invention were employed 
with click beetle luciferase and Renilla luciferase nucleic acid sequences. While 
both of these nucleic acid molecules encode luciferase proteins, they are from 
entirely different families and are widely separated evolutionarily. These 

30 proteins have unrelated amino acid sequences, protein structures, and they utilize 
dissimilar chemical substrates. The fact that they share the name "luciferase" 
should not be interpreted to mean that they are from the same family, or even 
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largely similar families. The methods produced synthetic luciferase nucleic acid 
molecules which exhibited significantly enhanced levels of mammalian 
expression without negatively effecting other desirable physical or biochemical 
properties (including protein half-life) and which were also largely devoid of 
5 known transcription regulatory elements. 

The invention also provides at least two synthetic nucleic acid molecules 
that encode highly related polypeptides, but which synthetic nucleic acid 
molecules have an increased number of nucleotide differences relative to each 
other. These differences decrease the recombination frequency between the two 

10 synthetic nucleic acid molecules when those molecules are both present in a cell 
(i.e., they are "codon distinct" versions of a synthetic nucleic acid molecule). 
Thus, the invention provides a method for preparing at least two synthetic 
nucleic acid molecules that are codon distinct versions of a parent nucleic acid 
sequence that encodes a polypeptide. The method comprises altering a parent 

15 nucleic acid sequence to yield a first synthetic nucleic acid molecule having an 
increased number of a first plurality of codons that are employed more 
frequently in a selected host cell relative to the number of those codons present 
in the parent nucleic acid sequence. Optionally, the first synthetic nucleic acid 
molecule also has a decreased number of transcription regulatory sequences 

20 relative to the parent nucleic acid sequence. The parent nucleic acid sequence is 
also altered to yield a second synthetic nucleic acid molecule having an 
increased number of a second plurality of codons that are employed more 
frequently in the host cell relative to the number of those codons in the parent 
nucleic acid sequence, wherein the first plurality of codons is different than the 

25 second plurality of codons, and wherein the first and the second synthetic nucleic 
acid molecules preferably encode the same polypeptide. Optionally, the second 
synthetic nucleic acid molecule has a decreased number of transcription 
regulatory sequences relative to the parent nucleic acid sequence. Either or both 
synthetic molecules can then be further modified. 

30 Clearly, the present invention has applications with many genes and 

across many fields of science including, but not limited to, life science research, 
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agrigenetics, genetic therapy, developmental science and pharmaceutical 
development. 

Brief Description of the Figures 

5 Figure L Codons and their corresponding amino acids. 

Figure 2. A nucleotide sequence comparison of a yellow-green (YG) 
click beetle luciferase nucleic acid sequence (YG #81-6G01; SEQ ID NO:2) and 
various synthetic green (GR) click beetle luciferase nucleic acid sequences 
(GRverl, SEQ ID NO:3; GRver2, SEQ ID NO:4; GRver3, SEQ ID NO:5; 

10 GRver4, SEQ ID NO:6; GRverS, SEQ ID NO:7; GR6, SEQ ID NO:8; GRverS.l, 
SEQ ID NO:9) and various red (RD) click beetle luciferase nucleic acid 
sequences (RDverl, SEQ ID NO:10; RDver2, SEQ ID NO:ll; RDver3 s SEQ ID 
NO:12; RDver4, SEQ ID NO:13; RDver5, SEQ ID NO:14; RD7, SEQ ID 
NO:15; RDverS.l, SEQ ID NO:16; RDver5.2, SEQ ID NO:17; RD156-1H9, 

15 SEQ ID NO:18). The nucleotides enclosed in boxes are nucleotides that differ 
from the nucleotide present at the homologous position in SEQ ID NO:2, 

Figure 3. An amino acid sequence comparison of a YG click beetle 
luciferase amino acid sequence (YG#81-6G01, SEQ ID NO:24) and various 
synthetic GR click beetle luciferase amino acid sequences (GRverl, SEQ ID 

20 NO:25; GRver2, SEQ ID NO:26; GRver3, SEQ ID NO:27; GRver4, SEQ ID 
NO:28; GRverS, SEQ ID NO:29; GR6, SEQ ID NO:30; GRverS.l, SEQ ID 
NO:31) and various red (RD) click beetle luciferase amino acid sequences 
(RDverl, SEQ ID NO:32; RDver2, SEQ ID NO:33; RDver3, SEQ ID NO:34; 
RDver4, SEQ ID NO:218; RDverS, SEQ ID NO:219; RD7, SEQ ID NO:220; 

25 RDverS.l, SEQ ID NO:221; RDver5.2, SEQ ID NO:222; RD156-1H9, SEQ ID 
NO:223). All amino acid sequences are inferred from the corresponding 
nucleotide sequence. The amino acids enclosed in boxes are amino acids that 
differ from the amino acid present at the homologous position in SEQ ID NO:24. 
Figure 4. Codon usage in YG#81-6G01 S GRverl, RDverl, GRverS, and 

30 RDverS, and humans (HUM) and relative codon usage in YG#81-6G01, GRverS, 
RDverS, and humans. 
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Figure 5. Codon usage summaries for YG#81-6G01 (Figure 5 A), and 
GR/RD synthetic nucleic acid sequences, GRverl (Figure 5B), RDverl (Figure 
5C), GRver2 (Figure 5D), RDver2 (Figure 5E), GRver3 (Figure 5F), RDver3 
(Figure 5G), GRver4 (Figure 5H), RDver4 (Figure 51), GRver5 (Figure 5J), 
5 RDver5 (5K). 

Figure 6. Oligonucleotides employed to prepare synthetic GR/RD 
luciferase genes (SEQ ID Nos. 35-245). 

Figure 7. A nucleotide sequence comparison of a wild type Renilla 
reniformis luciferase nucleic acid sequence Genbank Accession No. M63501 
10 (RELLUC, SEQ ID NO: 19) and various synthetic Renilla luciferase nucleic acid 
sequences (Rlucverl, SEQ ID NO:20; Rlucver2, SEQ ED NO:21; Rluc-final, 
SEQ ID NO:22). The nucleotides enclosed in boxes are nucleotides that differ 
from the nucleotide present at the homologous position in SEQ ID NO: 19. 

Figure 8. An amino acid sequence comparison of a wild type Renilla 
15 reniformis luciferase amino acid sequence (RELLUC, SEQ ID NO:224) and 
various synthetic Renilla reniformis luciferase amino acid sequences (Rlucverl, 
SEQ ID NO:225; Rlucver2, SEQ ID NO:226; Rluc-final, SEQ ID NO:227). AH 
amino acid sequences are inferred from the corresponding nucleotide sequence. 
The amino acids enclosed in boxes are amino acids that differ from the amino 
20 acid present at the homologous position in SEQ ID NO:224. 

Figure 9. Codon usage in wild-type (A) versus synthetic (B) Renilla 
luciferase genes. For codon usage in selected organisms, see, e.g., Wada et al., 
1990; Sharp et al., 1988; Aota et al, 1988; and Sharp et al., 1987, and for plant 
codons, Murray et al. 1989. 
25 Figure 10. Oligonucleotides employed to prepare synthetic Renilla 

luciferase gene (SEQ ID Nos. 246-292). 

Figure 1 1 . A nucleotide sequence comparison of a wild type yellow- 
green (YG) click beetle luciferase nucleic acid sequence (LUCPPLYG, SEQ ID 
NO:l) and the synthetic green click beetle luciferase nucleic acid sequences 
30 (GRverS.l, SEQ ID NO:9) and the synthetic red click beetle luciferase nucleic 
acid sequences (RD156-1H9, SEQ ID NO:18). The nucleotides enclosed in 
boxes are nucleotides that differ from the nucleotide present at the homologous 
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position in SEQ ID NO:l. Both synthetic sequences have a codon composition 
that differs from LUCPPLYG at more than 25% of the codons and have at least 
3-fold fewer transcription regulatory sequences relative to a random selection of 
codons at the codons which differ. 
5 Figure 12. An amino acid sequence comparison of a wild type YG click 

beetle luciferase amino acid sequence (LUCPPLYG, SEQ ID NO:23) and the 
synthetic GR click beetle luciferase amino acid sequences (GRverS.l, SEQ ID 
NO:31) and the red (RD) click beetle luciferase amino acid sequences (RD156- 
1H9, SEQ ID NO:223). All amino acid sequences are inferred from the 

10 corresponding nucleotide sequence. The amino acids enclosed in boxes are 
amino acids that differ from the amino acid present at the homologous position 
inSEQIDNO:23. 

Figure 13. pRL vector series. All of the vectors contain the Renilla wild 
type or synthetic gene as further described herein. Figure 13A illustrates the 

15 Renilla luciferase gene in the pGL3 vectors (Promega Corp.) Figure 13B 
illustrates the Renilla luciferase co-reporter vector series. pRL-TK has the 
herpes simplex virus (HSV) tk promoter; pRL-S V40 has the S V40 virus early 
enhancer/promoter; pRL-CMV has the cytomegalovirus (CMV) enhancer and 
immediate early promoter; pRL-null has MCS (multiple cloning sites) but no 

20 promoter or enhancer; pRL-TK(Int ") has HSV/tk promoter without an intron that 
is present in the other plasmids; pR-GL3B has the pGL-3 Basic backbone 
(Promega Corp.); pR-GL3 TK has the pGL3-Basic backbone with an HSV tk 
promoter. 

Figure 14. Half-life of synthetic (Rluc-final) and native Renilla 
25 luciferases in CHO cells. 

Figures 15A-B. In vitro transcription/translation of Renilla luciferase 
nucleic acid sequences. A) t = 0-60 minutes; B) linear range. 

Figures 15C-D. In vitro translation of native and synthetic (Rluc-final) 
Renilla luciferase RNAs in a rabbit reticulocyte lysate. RNA was quantitated 
30 and the same amount was employed as in the translation reaction shown in 
Figures 15A-B. C) t = 0-60 minutes; D) linear range. 
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Figures 15E-F. Translation of native and synthetic (Rluc-final) Renilla 
RNAs in a wheat germ extract. E) t = 0-60 minutes; F) linear range. 

Figure 16. High expression from a synthetic Renilla nucleic acid 
sequence reduces the risk of promoter interference in a co-transfection assay. 
5 CHO cells were co-transfected with a constant amount (50 ng) of firefly 
luciferase expression vector (pGL3 control vector, with S V40 promoter and 
enhancer; Luc-K) and a pRL vector having a native (0 ng, 50 ng, 100 ng, 500 ng, 
1 jig or 2 jig) or synthetic (0 ng, 5 ng, 10 ng, 50 ng, 100 ng or 200 ng) Renilla 
luciferase gene. 

10 Figures 17A-B. Illustrates the reactions catalyzed by firefly and click 

beetle (17A), and Renilla (17B) luciferases. 

Figure 18. Nucleotide and inferred amino acid sequence of click beetle 
luciferases in pGL3 vectors (GRverS.l in pGL3, SEQ ID NO:297 encoding SEQ 
ID NO:298; RDverS.l in pGL3, SEQ ID NO*:299 encoding SEQ ID NO:300; and 

15 RD156-1H9 in pGL3, SEQ ID NO:301 encoding SEQ ID NO:302). To clone 
GRverS.l, RDverS.l, and RD156-1H9 nucleic acid sequences into pGL3 
vectors, an oligonucleotide having an Nco I site at the initiation codon was 
employed, which resulted in an amino acid substitution at position 2 to valine. 

20 Detailed Description of the Invention 

Definitions 

The term "gene" as used herein, refers to a DNA sequence that comprises 
coding sequences necessary for the production of a polypeptide or protein 
precursor. The polypeptide can be encoded by a full length coding sequence or 
25 by any portion of the coding sequence, as long as the desired protein activity is 
retained. 

A "nucleic acid", as used herein, is a covalently linked sequence of 
nucleotides in which the 3' position of the pentose of one nucleotide is joined by 
a phosphodiester group to the 5' position of the pentose of the next, and in which 
30 the nucleotide residues (bases) are finked in specific sequence, i.e., a linear order 
of nucleotides. A "polynucleotide", as used herein, is a nucleic acid containing a 
sequence that is greater than about 100 nucleotides in length. An 
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"oligonucleotide", as used herein, is a short polynucleotide or a portion of a 
polynucleotide. An oligonucleotide typically contains a sequence of about two 
to about one hundred bases. The word "oligo" is sometimes used in place of the 
word "oligonucleotide". 
5 Nucleic acid molecules are said to have a "5'-tenninus" (5' end) and a 

"3'-tenninus" (3' end) because nucleic acid phosphodiester linkages occur to the 
5' carbon and 3' carbon of the pentose ring of the substituent mononucleotides. 
The end of a polynucleotide at which a new linkage would be to a 5' carbon is its 
5' terminal nucleotide. The end of a polynucleotide at which a new linkage 

10 would be to a 3' carbon is its 3' terminal nucleotide. A terminal nucleotide, as 
used herein, is the nucleotide at the end position of the 3'- or 5'-terminus. 

DNA molecules are said to have "5' ends" and "3' ends" because 
mononucleotides are reacted to make oligonucleotides in a manner such that the 
5' phosphate of one mononucleotide pentose ring is attached to the 3' oxygen of 

15 its neighbor in one direction via a phosphodiester linkage. Therefore, an end of 
an oligonucleotides referred to as the "5' end" if its 5' phosphate is not linked to 
the 3' oxygen of a mononucleotide pentose ring and as the "3' end" if its 3' 
oxygen is not linked to a 5' phosphate of a subsequent mononucleotide pentose 
ring. 

20 As used herein, a nucleic acid sequence, even if internal to a larger 

oligonucleotide or polynucleotide, also may be said to have 5' and 3' ends. In 
either a linear or circular DNA molecule, discrete elements are referred to as 
being "upstream" or 5' of the "downstream" or 3' elements. This terminology 
reflects the fact that transcription proceeds in a 5' to 3' fashion along the DNA 

25 strand. Typically, promoter and enhancer elements that direct transcription of a 
linked gene are generally located 5' or upstream of the coding region. However, 
enhancer elements can exert their effect even when located 3' of the promoter 
element and the coding region. Transcription termination and polyadenylation 
signals are located 3' or downstream of the coding region. 

30 The term "codon" as used herein, is a basic genetic coding unit, 

consisting of a sequence of three nucleotides that specify a particular amino acid 
to be incorporation into a polypeptide chain, or a start or stop signal. Figure 1 
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contains a codon table. The term "coding region" when used in reference to 
structural gene refers to the nucleotide sequences that encode the amino acids 
found in the nascent polypeptide as a result of translation of a mRNA molecule. 
Typically, the coding region is bounded on the 5' side by the nucleotide triplet 
5 "ATG" which encodes the initiator methionine and on the 3' side by a stop codon 
(e.g., TAA, TAG, TGA). In some cases the coding region is also known to 
initiate by a nucleotide triplet "TTG". 

By "protein" and "polypeptide" is meant any chain of amino acids, 
regardless of length or post-translational modification (e.g., glycosylation or 

10 phosphorylation). The synthetic genes of the invention may also encode a 
variant of a naturally-occmring protein or polypeptide fragment thereof. 
Preferably, such a protein polypeptide has an amino acid sequence that is at least 
85%, preferably 90%, and most preferably 95% or 99% identical to the amino 
acid sequence of the naturally-occurring (native) protein from which it is 

15 derived. 

Polypeptide molecules are said to have an "amino terminus" 
(N-terminus) and a "carboxy terminus" (C-terminus) because peptide linkages 
occur between the backbone amino group of a first amino acid residue and the 
backbone carboxyl group of a second amino acid residue. The terms 

20 "N-terminar and "C-terminal" in reference to polypeptide sequences refer to 
regions of polypeptides including portions of the N-terminal and C-tenninal 
regions of the polypeptide, respectively. A sequence that includes a portion of 
the N-tenninal region of polypeptide includes amino acids predominantly from 
the N-terminal half of the polypeptide chain, but is not limited to such 

25 sequences. For example, an N-terminal sequence may include an interior portion 
of the polypeptide sequence including bases from both the N-terminal and 
C-terminal halves of the polypeptide. The same applies to C-tenninal regions. 
N-terminal and C-terminal regions may, but need not, include the amino acid 
defining the ultimate N-terminus and C-terminus of the polypeptide, 

30 respectively. 

The term "wild type" as used herein, refers to a gene or gene product that 
has the characteristics of that gene or gene product isolated from a naturally 
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occurring source. A wild type gene is that which is most frequently observed in 
a population and is thus arbitrarily designated the "wild type" form of the gene. 
In contrast, the term "mutant" refers to a gene or gene product that displays 
modifications in sequence and/or functional properties (i.e., altered 
5 characteristics) when compared to the wild type gene or gene product. It is noted 
that naturally-occurring mutants can be isolated; these are identified by the fact 
that they have altered characteristics when compared to the wild type gene or 
gene product. 

The terms "complementary" or "complementarity" are used in reference 

10 to a sequence of nucleotides related by the base-pairing rules. For example, for 
the sequence 5' "A-G-T" 3', is complementary to the sequence 3' "T-C-A" 5'. 
Complementarity may be "partial," in which only some of the nucleic acids 1 
bases are matched according to the base pairing rules. Or, there may be 
"complete" or "total" complementarity between the nucleic acids. The degree of 

15 complementarity between nucleic acid strands has significant effects on the 

efficiency and strength of hybridization between nucleic acid strands. This is of 
particular importance in amplification reactions, as well as detection methods 
which depend upon hybridization of nucleic acids. 

The term "recombinant protein" or "recombinant polypeptide" as used 

20 herein refers to a protein molecule expressed from a recombinant DNA 

molecule. In contrast, the term "native protein" is used herein to indicate a 
protein isolated from a naturally occurring (i.e., a nonrecombinant) source. 
Molecular biological techniques may be used to produce a recombinant form of a 
protein with identical properties as compared to the native form of the protein. 

25 The terms "fusion protein" and "fusion partner" refer to a chimeric 

protein containing the protein of interest (e.g., luciferase) joined to an exogenous 
protein fragment (e.g., a fusion partner which consists of a non-luciferase 
protein). The fusion partner may enhance the solubility of protein as expressed 
in a host cell, may, for example, provide an affinity tag to allow purification of 

30 the recombinant fusion protein from the host cell or culture supernatant, or both. 
If desired, the fusion partner may be removed from the protein of interest by a 
variety of enzymatic or chemical means known to the art. 
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The terms "cell/ 1 "cell line," "host cell," as used herein, are used 
interchangeably, and all such designations include progeny or potential progeny 
of these designations. By 'transformed cell" is meant a cell into which (or into 
an ancestor of which) has been introduced a DNA molecule comprising a 
5 synthetic gene. Optionally, a synthetic gene of the invention may be introduced 
into a suitable cell line so as to create a stably-transfected cell line capable of 
producing the protein or polypeptide encoded by the synthetic gene. Vectors , 
cells, and methods for constructing such cell lines are well known in the art, e.g. 
in Ausubel, et al. (infra). The words "transformants" or "transformed cells" 

10 include the primary transformed cells derived from the originally transformed 
cell without regard to the number of transfers. All progeny may not be precisely 
identical in DNA content, due to deliberate or inadvertent mutations. 
Nonetheless, mutant progeny that have the same functionality as screened for in 
the originally transformed cell are included in the definition of transformants. 

1 5 Nucleic acids are known to contain different types of mutations. A 

"point" mutation refers to an alteration in the sequence of a nucleotide at a single 
base position from the wild type sequence. Mutations may also refer to insertion 
or deletion of one or more bases, so that the nucleic acid sequence differs from 
the wild-type sequence. 

20 The terra "homology" refers to a degree of complementarity. There may 

be partial homology or complete homology (i.e., identity). Homology is often 
measured using sequence analysis software (e.g., Sequence Analysis Software 
Package of the Genetics Computer Group. University of Wisconsin 
Biotechnology Center. 1710 University Avenue. Madison, WI 53705). Such 

25 software matches similar sequences by assigning degrees of homology to various 
substitutions, deletions, insertions, and other modifications. Conservative 
substitutions typically include substitutions within the following groups: 
glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid, 
asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, 

30 tyrosine. 

A "partially complementary" sequence is one that at least partially 
inhibits a completely complementary sequence from hybridizing to a target 
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nucleic acid is referred to using the functional term "substantially homologous." 
The inhibition of hybridization of the completely complementary sequence to the 
target sequence may be examined using a hybridization assay (Southern or 
Northern blot, solution hybridization and the like) under conditions of low 
5 stringency. A substantially homologous sequence or probe will compete for and 
inhibit the binding (i.e., the hybridization) of a completely homologous to a 
target under conditions of low stringency. This is not to say that conditions of 
low stringency are such that non-specific binding is permitted; low stringency 
conditions require that the binding of two sequences to one another be a specific 

10 (i.e., selective) interaction. The absence of non-specific binding may be tested 
by the use of a second target which lacks even a partial degree of 
complementarity (e.g., less than about 30% identity). In this case, in the absence 
of non-specific binding, the probe will not hybridize to the second 
non-complementary target. 

15 When used in reference to a double-stranded nucleic acid sequence such 

as a cDNA or a genomic clone, the term "substantially homologous" refers to 
any probe which can hybridize to either or both strands of the double-stranded 
nucleic acid sequence under conditions of low stringency as described herein. 
"Probe" refers to an oligonucleotide designed to be sufficiently 

20 complementary to a sequence in a denatured nucleic acid to be probed (in 
relation to its length) to be bound under selected stringency conditions. 

"Hybridization" and "binding" in the context of probes and denature 
melted nucleic acid are used interchangeably. Probes which are hybridized or 
bound to denatured nucleic acid are base paired to complementary sequences in 

25 the polynucleotide. Whether or not a particular probe remains base paired with 
the polynucleotide depends on the degree of complementarity, the length of the 
probe, and the stringency of the binding conditions. The higher the stringency, 
the higher must be the degree of complementarity and/or the longer the probe. 
The term "hybridization" is used in reference to the pairing of 

30 complementary nucleic acid strands. Hybridization and the strength of 

hybridization (i.e., the strength of the association between nucleic acid strands) is 
impacted by many factors well known in the art including the degree of 
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complementarity between the nucleic acids, stringency of the conditions 
involved affected by such conditions as the concentration of salts, the Tm 
(melting temperature) of the formed hybrid, the presence of other components 
(e.g., the presence or absence of polyethylene glycol), the molarity of the 
5 hybridizing strands and the G:C content of the nucleic acid strands. 

The term "stringency" is used in reference to the conditions of 
temperature, ionic strength, and the presence of other compounds, under which 
nucleic acid hybridizations are conducted. With "high stringency" conditions, 
nucleic acid base pairing will occur only between nucleic acid fragments that 

10 have a high frequency of complementary base sequences. Thus, conditions of 
"medium" or "low" stringency are often required when it is desired that nucleic 
acids which are not completely complementary to one another be hybridized or 
annealed together. The art knows well that numerous equivalent conditions can 
be employed to comprise medium or low stringency conditions. The choice of 

15 hybridization conditions is generally evident to one skilled in the art and is 
usually guided by the purpose of the hybridization, the type of hybridization 
(DNA-DNA or DNA-RNA), and the level of desired relatedness between the 
sequences (e.g., Sambrook et al., 1989; Nucleic Acid Hybridization, A Practical 
Approach, IRL Press, Washington D.C., 1985, for a general discussion of the 

20 methods). 

The stability of nucleic acid duplexes is known to decrease with an 
increased number of mismatched bases, and further to be decreased to a greater 
or lesser degree depending on the relative positions of mismatches in the hybrid 
duplexes. Thus, the stringency of hybridization can be used to maximize or 

25 minimize stability of such duplexes. Hybridization stringency can be altered by: 
adjusting the temperature of hybridization; adjusting the percentage of helix 
destabilizing agents, such as formamide, in the hybridization mix; and adjusting 
the temperature and/or salt concentration of the wash solutions. For filter 
hybridizations, the final stringency of hybridizations often is determined by the 

30 salt concentration and/or temperature used for the post-hybridization washes. 

"High stringency conditions" when used in reference to nucleic acid 
hybridization comprise conditions equivalent to binding or hybridization at 42°C 
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in a solution consisting of 5X SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH 2 P0 4 H 2 0 and 
1 .85 g/1 EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS, 5X Denhardfs 
reagent and 100 jig/ml denatured salmon sperm DNA followed by washing in a 
solution comprising 0.1X SSPE, 1.0% SDS at 42°C when a probe of about 500 
5 nucleotides in length is employed. 

"Medium stringency conditions" when used in reference to nucleic acid 
hybridization comprise conditions equivalent to binding or hybridization at 42°C 
in a solution consisting of 5X SSPE (43.8 g/1 NaCl, 6.9 g/1 NaH 2 P0 4 H 2 0 and 
1.85 g/1 EDTA, pH adjusted to 7.4 with NaOH), 0.5% SDS, 5X Denhardfs 

10 reagent and 100 (ig/ml denatured salmon sperm DNA followed by washing in a 
solution comprising 1.0X SSPE, 1.0% SDS at 42°C when a probe of about 500 
nucleotides in length is employed. 

"Low stringency conditions" comprise conditions equivalent to binding 
or hybridization at 42°C in a solution consisting of 5X SSPE (43.8 g/1 NaCl, 6.9 

15 g/1 NaH 2 P0 4 H 2 0 and 1 .85 g/1 EDTA, pH adjusted to 7.4 with NaOH), 0.1% 
SDS, 5X Denhardfs reagent [50X Denhardfs contains per 500 ml: 5 g Ficoll 
(Type 400, Pharmacia), 5 g BSA (Fraction V; Sigma)] and 100 g/ml denatured 
salmon sperm DNA followed by washing in a solution comprising 5X SSPE, 
0.1% SDS at 42°C when a probe of about 500 nucleotides in length is employed. 

20 The term "T m " is used in reference to the "melting temperature". The 

melting temperature is the temperature at which 50% of a population of 
double-stranded nucleic acid molecules becomes dissociated into single strands. 
The equation for calculating the T m of nucleic acids is well-known in the art. 
The Tm of a hybrid nucleic acid is often estimated using a formula adopted from 

25 hybridization assays in 1 M salt, and commonly used for calculating Tm for PCR 
primers: [(number of A + T) x 2°C + (number of G+C) x 4°C]. (C.R. Newton et 
al., PCR, 2nd Ed., Springer-Verlag (New York, 1997), p. 24). This formula was 
found to be inaccurate for primers longer than 20 nucleotides. (Id.) Another 
simple estimate of the T m value may be calculated by the equation: T m = 81.5 + 

30 0.41(% G + C), when a nucleic acid is in aqueous solution at 1 M NaCl. (e.g., 
Anderson and Young, Quantitative Filter Hybridization, in Nucleic Acid 
Hybridization . 1985). Other more sophisticated computations exist in the art 
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which take structural as well as sequence characteristics into account for the 
calculation of T m . A calculated T m is merely an estimate; the optimum 
temperature is commonly determined empirically. 

The term "isolated" when used in relation to a nucleic acid, as in "isolated 
5 oligonucleotide" or "isolated polynucleotide" refers to a nucleic acid sequence 
that is identified and separated from at least one contaminant with which it is 
ordinarily associated in its source. Thus, an isolated nucleic acid is present in a 
form or setting that is different from that in which it is found in nature. In 
contrast, non-isolated nucleic acids (e.g., DNA and RNA) are found in the state 

1 0 they exist in nature. For example, a given DNA sequence (e.g., a gene) is found 
on the host cell chromosome in proximity to neighboring genes; RNA sequences 
(e.g., a specific mRNA sequence encoding a specific protein), are found in the 
cell as a mixture with numerous other mRNAs that encode a multitude of 
proteins. However, isolated nucleic acid includes, by way of example, such 

15 nucleic acid in cells ordinarily expressing that nucleic acid where the nucleic 
acid is in a chromosomal location different from that of natural cells, or is 
otherwise flanked by a different nucleic acid sequence than that found in nature. 
The isolated nucleic acid or oligonucleotide may be present in single-stranded or 
double-stranded form. When an isolated nucleic acid or oligonucleotide is to be 

20 utilized to express a protein, the oligonucleotide contains at a minimum, the 
sense or coding strand (i.e., the oligonucleotide may single-stranded), but may 
contain both the sense and anti-sense strands (i.e., the oligonucleotide may be 
double-stranded). 

The term "isolated" when used in relation to a polypeptide, as in "isolated 
25 protein" or "isolated polypeptide" refers to a polypeptide that is identified and 
separated from at least one contaminant with which it is ordinarily associated in 
its source. Thus, an isolated polypeptide is present in a form or setting that is 
different from that in which it is found in nature. In contrast, non-isolated 
polypeptides (e.g., proteins and enzymes) are found in the state they exist in 
30 nature. 

The term "purified" or "to purify" means the result of any process that 
removes some of a contaminant from the component of interest, such as a protein 
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or nucleic acid. The percent of a purified component is thereby increased in the 
sample. 

The term "operably linked" as used herein refer to the linkage of nucleic 
acid sequences in such a manner that a nucleic acid molecule capable of 
5 directing the transcription of a given gene and/or the synthesis of a desired 
protein molecule is produced. The term also refers to the linkage of sequences 
encoding amino acids in such a manner that a functional (e.g., enzymatically 
active, capable of binding to a binding partner, capable of inhibiting, etc.) protein 
or polypeptide is produced. 

10 The term "recombinant DNA molecule" means a hybrid DNA sequence 

comprising at least two nucleotide sequences not normally found together in 
nature. The term "vector" is used in reference to nucleic acid molecules 

into which fragments of DNA may be inserted or cloned and can be used to 
transfer DNA segment(s) into a cell and capable of replication in a cell. Vectors 

15 may be derived from plasmids, bacteriophages, viruses, cosmids, and the like. 

The terms "recombinant vector" and "expression vector" as used herein 
refer to DNA or RNA sequences containing a desired coding sequence and 
appropriate DNA or RNA sequences necessary for the expression of the operably 
linked coding sequence in a particular host organism. Prokaryotic expression 

20 vectors include a promoter, a ribosome binding site, an origin of replication for 
autonomous replication in a host cell and possibly other sequences, e.g. an 
optional operator sequence, optional restriction enzyme sites. A promoter is 
defined as a DNA sequence that directs RNA polymerase to bind to DNA and to 
initiate RNA synthesis. Eukaryotic expression vectors include a promoter, 

25 optionally a polyadenlyation signal and optionally an enhancer sequence. 

The term "a polynucleotide having a nucleotide sequence encoding a 
gene," means a nucleic acid sequence comprising the coding region of a gene, or 
in other words the nucleic acid sequence which encodes a gene product. The 
coding region may be present in either a cDNA, genomic DNA or RNA form. 

30 When present in a DNA form, the oligonucleotide may be single-stranded (i.e., 
the sense strand) or double-stranded. Suitable control elements such as 
enhancers/promoters, splice junctions, polyadenylation signals, etc. may be 
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placed in close proximity to the coding region of the gene if needed to permit 
proper initiation of transcription and/or correct processing of the primary RNA 
transcript. Alternatively, the coding region utilized in the expression vectors of 
the present invention may contain endogenous enhancers/promoters, splice 
5 junctions, intervening sequences, polyadenylation signals, etc. In further 

embodiments, the coding region may contain a combination of both endogenous 
and exogenous control elements. 

The term "transcription regulatory element" or "transcription regulatory 
sequence" refers to a genetic element or sequence that controls some aspect of 

10 the expression of nucleic acid sequence(s). For example, a promoter is a 

regulatory element that facilitates the initiation of transcription of an operably 
linked coding region. Other regulatory elements include, but are not limited to, 
transcription factor binding sites, splicing signals, polyadenylation signals, 
termination signals and enhancer elements. 

15 Transcriptional control signals in eukaryotes comprise "promoter" and 

"enhancer" elements. Promoters and enhancers consist of short arrays of DNA 
sequences that interact specifically with cellular proteins involved in 
transcription (Maniatis et al., 1987). Promoter and enhancer elements have been 
isolated from a variety of eukaryotic sources including genes in yeast, insect and 

20 mammalian cells. Promoter and enhancer elements have also been isolated from 
viruses and analogous control elements, such as promoters, are also found in 
prokaryotes. The selection of a particular promoter and enhancer depends on the 
cell type used to express the protein of interest. Some eukaryotic promoters and 
enhancers have a broad host range while others are functional in a limited subset 

25 of cell types (for review, see Voss et al., 1986; and Maniatis et al., 1987. For 
example, the SV40 early gene enhancer is very active in a wide variety of cell 
types from many mammalian species and has been widely used for the 
expression of proteins in mammalian cells (Dijkema et al., 1985). Two other 
examples of promoter/enhancer elements active in a broad range of mammalian 

30 cell types are those from the human elongation factor 1 gene (Uetsuki et al., 

1989; Kim, et al., 1990; and Mizushima and Nagata, 1990) and the long terminal 
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repeats of the Rous sarcoma virus (Gorman et al., 1982); and the human 
cytomegalovirus (Boshart et al., 1985). 

The term "promoter/enhancer" denotes a segment of DNA containing 
sequences capable of providing both promoter and enhancer functions (i.e., the 
5 functions provided by a promoter element and an enhancer element as described 
above). For example, the long terminal repeats of retroviruses contain both 
promoter and enhancer functions. The enhancer/promoter may be "endogenous" 
or "exogenous" or "heterologous." An "endogenous" enhancer/promoter is one 
that is naturally linked with a given gene in the genome. An "exogenous" or 

10 "heterologous" enhancer/promoter is one that is placed in juxtaposition to a gene 
by means of genetic manipulation (i.e., molecular biological techniques) such 
that transcription of the gene is directed by the linked enhancer/promoter. 

The presence of "splicing signals" on an expression vector often results in 
higher levels of expression of the recombinant transcript in eukaryotic host cells. 

1 5 Splicing signals mediate the removal of introns from the primary RNA 
transcript and consist of a splice donor and acceptor site (Sambrook, et al., 
Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor 
Laboratory Press, New York, 1989, pp. 16.7-16.8). A commonly used splice 
donor and acceptor site is the splice junction from the 16S RNA of SV40. 

20 Efficient expression of recombinant DNA sequences in eukaryotic cells 

requires expression of signals directing the efficient termination and 
polyadenylation of the resulting transcript. Transcription termination signals are 
generally found downstream of the polyadenylation signal and are a few hundred 
nucleotides in length. The term "poly(A) site" or "poly(A) sequence" as used 

25 herein denotes a DNA sequence which directs both the termination and 

polyadenylation of the nascent RNA transcript. Efficient polyadenylation of the 
recombinant transcript is desirable, as transcripts lacking a poly(A) tail are 
unstable and are rapidly degraded. The poly(A) signal utilized in an expression 
vector may be "heterologous" or "endogenous." An endogenous poly(A) signal 

30 is one that is found naturally at the 3' end of the coding region of a given gene in 
the genome. A heterologous poly(A) signal is one which has been isolated from 
one gene and positioned 3' to another gene. A commonly used heterologous 
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poly(A) signal is the SV40 poly(A) signal. The SV40 poly(A) signal is 
contained on a 237 bp BairiR VBcl I restriction fragment and directs both 
termination and polyadenylation (Sambrook, supra, at 16.6-16.7). 

Eukaryotic expression vectors may also contain "viral replicons "or "viral 
5 origins of replication." Viral replicons are viral DNA sequences which allow for 
the extrachromosomal replication of a vector in a host cell expressing the 
appropriate replication factors. Vectors containing either the SV40 or polyoma 
virus origin of replication replicate to high copy number (up to 10 4 copies/cell) 
in cells that express the appropriate viral T antigen. In contrast, vectors 

1 0 containing the replicons from bovine papillomavirus or Epstein-Barr virus 
replicate extrachromosomally at low copy number (about 100 copies/cell). 

The term "z>z vitro 11 refers to an artificial environment and to processes or 
reactions that occur within an artificial environment. In vitro environments 
include, but are not limited to, test tubes and cell lysates. The term "z>z situ" 

15 refers to cell culture. The term "in vivo" refers to the natural environment (e.g., 
an animal or a cell) and to processes or reaction that occur within a natural 
environment. 

The term "expression system" refers to any assay or system for 
determining (e.g., detecting) the expression of a gene of interest. Those skilled 

20 in the field of molecular biology will understand that any of a wide variety of 
expression systems may be used. A wide range of suitable mammalian cells are 
available from a wide range of source (e.g., the American Type Culture 
Collection, Rockland, MD). The method of transformation or transfection and 
the choice of expression vehicle will depend on the host system selected. 

25 Transformation and transfection methods are described, e.g., in Ausubel, et al., 
Current Protocols in Molecular Biology. John Wiley & Sons, New York. 1992. 
Expression systems include in vitro gene expression assays where a gene of 
interest (e.g., a reporter gene) is linked to a regulatory sequence and the 
expression of the gene is monitored following treatment with an agent that 

30 inhibits or induces expression of the gene. Detection of gene expression can be 
through any suitable means including, but not limited to, detection of expressed 
mRNA or protein (e.g., a detectable product of a reporter gene) or through a 
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detectable change in the phenotype of a cell expressing the gene of interest. 
Expression systems may also comprise assays where a cleavage event or other 
nucleic acid or cellular change is detected. 

The term "enzyme" refers to molecules or molecule aggregates that are 
5 responsible for catalyzing chemical and biological reactions. Such molecules are 
typically proteins, but can also comprise short peptides, RNAs, ribozymes, 
antibodies, and other molecules. A molecule that catalyzes chemical and 
biological reactions is referred to as "having enzyme activity" or "having 
catalytic activity." 
10 All amino acid residues identified herein are in the natural 

L-configuration. In keeping with standard polypeptide nomenclature (see J. 
Biol. Chem., 243, 3557 (1969)), abbreviations for amino acid residues are as 
shown in the following Table of Correspondence. 



15 TABLE OF CORRESPONDENCE 



l-Letter 


3-Letter 


AMINO ACID 


Y 


Tyr 


L-tyrosine 


G 


Gly 


glycine 


F 


Phe 


L-phenylalanine 


M 


Met 


L-methionine 


A 


Ala 


L-alanine 


S 


Ser 


L-serine 


I 


De 


L-isoleucine 


L 


Leu 


L-leucine 


T 


Thr 


L-threonine 


V 


Val 


L-valine 


P 


Pro 


L-proline 


K 


Lys 


L-lysine 


H 


His 


L-histidine 


Q 


Gin 


L-glutamine 


E 


Glu 


L-glutamic acid 


W 


Tip 


L-tryptophan 
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R Arg L-arginine 

D Asp L-aspartic acid 

N Asn L-asparagine 

C Cys L-cysteine 

5 

The term "sequence homology" means the proportion of base matches 
between two nucleic acid sequences or the proportion of amino acid matches 
between two amino acid sequences. When sequence homology is expressed as a 
percentage, e.g., 50%, the percentage denotes the proportion of matches over the 

10 length of sequence from one sequence that is compared to some other sequence. 
Gaps (in either of the two sequences) are permitted to maximize matching; gap 
lengths of 15 bases or less are usually used, 6 bases or less are preferred with 
2 bases or less more preferred. When using oligonucleotides as probes or 
treatments, the sequence homology between the target nucleic acid and the 

15 oligonucleotide sequence is generally not less than 17 target base matches out of 
20 possible oligonucleotide base pair matches (85%); preferably not less than 9 
matches out of 10 possible base pair matches (90%), and more preferably not less 
than 19 matches out of 20 possible base pair matches (95%). 

Two amino acid sequences are homologous if there is a partial or 

20 complete identity between their sequences. For example, 85% homology means 
that 85% of the amino acids are identical when the two sequences are aligned for 
maximum matching. Gaps (in either of the two sequences being matched) are 
allowed in maximizing matching; gap lengths of 5 or less are preferred with 2 or 
less being more preferred. Alternatively and preferably, two protein sequences 

25 (or polypeptide sequences derived from them of at least 100 amino acids in 
length) are homologous, as this term is used herein, if they have an alignment 
score of at more than 5 (in standard deviation units) using the program ALIGN 
with the mutation data matrix and a gap penalty of 6 or greater. See Dayhoff, M. 
O., in Atlas of Protein Sequence and Structure, 1972, volume 5, National 

30 Biomedical Research Foundation, pp. 101-1 10, and Supplement 2 to this 
volume, pp. 1-10. The two sequences or parts thereof are more preferably 
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homologous if their amino acids are greater than or equal to 85% identical when 
optimally aligned using the ALIGN program. 

The following terms are used to describe the sequence relationships 
between two or more polynucleotides: "reference sequence", "comparison 
5 window", "sequence identity", "percentage of sequence identity", and 

"substantial identity". A "reference sequence" is a defined sequence used as a 
basis for a sequence comparison; a reference sequence may be a subset of a 
larger sequence, for example, as a segment of a full-length cDNA or gene 
sequence given in a sequence listing, or may comprise a complete cDNA or gene 

10 sequence. Generally, a reference sequence is at least 20 nucleotides in length, 
frequently at least 25 nucleotides in length, and often at least 50 nucleotides in 
length. Since two polynucleotides may each (1) comprise a sequence (i.e., a 
portion of the complete polynucleotide sequence) that is similar between the two 
polynucleotides, and (2) may further comprise a sequence that is divergent 

15 between the two polynucleotides, sequence comparisons between two (or more) 
polynucleotides are typically performed by comparing sequences of the two 
polynucleotides over a "comparison window" to identify and compare local 
regions of sequence similarity. 

A "comparison window", as used herein, refers to a conceptual segment 

20 of at least 20 contiguous nucleotides and wherein the portion of the 

polynucleotide sequence in the comparison window may comprise additions or 
deletions (i.e., gaps) of 20 percent or less as compared to the reference sequence 
(which does not comprise additions or deletions) for optimal alignment of the 
two sequences. 

25 Methods of alignment of sequences for comparison are well known in the 

art. Thus, the determination of percent identity between any two sequences can 
be accomplished using a mathematical algorithm. Preferred, non-limiting 
examples of such mathematical algorithms are the algorithm of Myers and Miller 
(1988); the local homology algorithm of Smith and Watennan (1981); the 

30 homology alignment algorithm of Needleman and Wunsch (1970); the search- 
for-similarity-method of Pearson and Lipman (1988); the algorithm of Karlin 
and Altschul (1990), modified as in Karlin and Altschul (1993). 
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Computer implementations of these mathematical algorithms can be 
utilized for comparison of sequences to determine sequence identity. Such 
implementations include, but are not limited to: CLUSTAL in the PC/Gene 
program (available from Intelligenetics, Mountain View, California); the ALIGN 
5 program (Version 2^0) and GAP, BESTFIT, BLAST, FASTA, and TFASTA in 
the Wisconsin Genetics Software Package, Version 8 (available from Genetics 
Computer Group (GCG), 575 Science Drive, Madison, Wisconsin, USA). 
Alignments using these programs can be performed using the default parameters. 
The CLUSTAL program is well described by Higgins et al. (1988); Higgins et 

10 al. (1989); Corpet et al. (1988); Huang et al. (1992); and Pearson et al. (1994). 
The ALIGN program is based on the algorithm of Myers and Miller, supra. The 
BLAST programs of Altschul et al. (1990), are based on the algorithm of Karlin 
and Altschul supra. To obtain gapped alignments for comparison purposes, 
Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul et al. 

15 (1 997). Alternatively, PSI-BLAST (in BLAST 2.0) can be used to perform an 
iterated search that detects distant relationships between molecules. See 
Altschul et al., supra. When utilizing BLAST, Gapped BLAST, PSI-BLAST, 
the default parameters of the respective programs (e.g. BLASTN for nucleotide 
sequences, BLASTX for proteins) can be used. See 

20 http://www.ncbi.nlm.nih.gov. Alignment may also be performed manually by 
inspection. 

The term "sequence identity" means that two polynucleotide sequences 
are identical (i.e., on a nucleotide-by-nucleotide basis) over the window of 
comparison, The term "percentage of sequence identity" means that two 

25 polynucleotide sequences are identical (i.e., on a nucleotide-by-nucleotide basis) 
for the stated proportion of nucleotides over the window of comparison. The 
term "percentage of sequence identity" is calculated by comparing two optimally 
aligned sequences over the window of comparison, determining the number of 
positions at which the identical nucleic acid base (e.g., A, T, C, G, U, or I) 

30 occurs in both sequences to yield the number of matched positions, dividing the 
number of matched positions by the total number of positions in the window of 
comparison (i.e., the window size), and multiplying the result by 100 to yield the 
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percentage of sequence identity. The terms "substantial identity 55 as used herein 
denote a characteristic of a polynucleotide sequence, wherein the polynucleotide 
comprises a sequence that has at least 60%, preferably at least 65%, more 
preferably at least 70%, up to about 85%, and even more preferably at least 90 to 
5 95%, more usually at least 99%, sequence identity as compared to a reference 
sequence over a comparison window of at least 20 nucleotide positions, 
frequently over a window of at least 20-50 nucleotides, and preferably at least 
300 nucleotides, wherein the percentage of sequence identity is calculated by 
comparing the reference sequence to the polynucleotide sequence which may 

10 include deletions or additions which total 20 percent or less of the reference 
sequence over the window of comparison. The reference sequence may be a 
subset of a larger sequence. 

As applied to polypeptides, the term "substantial identity" means that two 
peptide sequences, when optimally aligned, such as by the programs GAP or 

15 BESTFIT using default gap weights, share at least about 85% sequence identity, 
preferably at least about 90% sequence identity, more preferably at least about 
95 % sequence identity, and most preferably at least about 99 % sequence 
identity. 

20 The Synthetic Nucleic Acid Molecules and Methods of the Invention 

The invention provides compositions comprising synthetic nucleic acid 
molecules, as well as methods for preparing those molecules which yield 
synthetic nucleic acid molecules that are efficiently expressed as a polypeptide or 
protein with desirable characteristics including reduced inappropriate or 

25 unintended transcription characteristics when expressed in a particular cell type. 
Natural selection is the hypothesis that genotype-environment 
interactions occurring at the phenotypic level lead to differential reproductive 
success of individuals and hence to modification of the gene pool of a 
population. It is generally accepted that the amino acid sequence of a protein 

30 found in nature has undergone optimization by natural selection. However, 
amino acids exist within the sequence of a protein that do not contribute 
significantly to the activity of the protein and these amino acids can be changed 
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to other amino acids with little or no consequence. Furthermore, a protein may 
be useful outside its natural environment or for purposes that differ from the 
conditions of its natural selection. In these circumstances, the amino acid 
sequence can be synthetically altered to better adapt the protein for its utility in 
5 various applications. 

Likewise, the nucleic acid sequence that encodes a protein is also 
optimized by natural selection. The relationship between coding DNA and its 
transcribed RNA is such that any change to the DNA affects the resulting RNA. 
Thus, natural selection works on both molecules simultaneously. However, this 

10 relationship does not exist between nucleic acids and proteins. Because multiple 
codons encode the same amino acid, many different nucleotide sequences can 
encode an identical protein. A specific protein composed of 500 amino acids can 
theoretically be encoded by more than 10 150 different nucleic acid sequences. 

Natural selection acts on nucleic acids to achieve proper encoding of the 

15 corresponding protein. Presumably, other properties of nucleic acid molecules 
are also acted upon by natural selection. These properties include codon usage 
frequency, RNA secondary structure, the efficiency of intron splicing, and 
interactions with transcription factors or other nucleic acid binding proteins. 
These other properties may alter the efficiency of protein translation and the 

20 resulting phenotype. Because of the redundant nature of the genetic code, these 
other attributes can be optimized by natural selection without altering the 
corresponding amino acid sequence. 

Under some conditions, it is useful to synthetically alter the natural 
nucleotide sequence encoding a protein to better adapt the protein for alternative 

25 applications. A common example is to alter the codon usage frequency of a gene 
when it is expressed in a foreign host. Although redundancy in the genetic code 
allows amino acids to be encoded by multiple codons, different organisms favor 
some codons over others. The codon usage frequencies tend to differ most for 
organisms with widely separated evolutionary histories. It has been found that 

30 when transferring genes between evolutionarily distant organisms, the efficiency 
of protein translation can be substantially increased by adjusting the codon usage 
frequency (see U.S. Patent Nos. 5,096,825, 5,670,356 and 5,874,304). 
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Because of the need for evolutionary distance, the codon usage of 
reporter genes often does not correspond to the optimal codon usage of the 
experimental cells. Examples include (3-galactosidase (P-ga/) and 
chloramphenicol acetyltransferase (cat) reporter genes that are derived from E. 
5 coli and are commonly used in mammalian cells; the p-glucuroiridase (gus) 
reporter gene that is derived from E. coli and commonly used in plant cells; the 
firefly luciferase Que) reporter gene that is derived from an insect and commonly 
used in plant and mammalian cells; and the Renilla luciferase, and green 
fluorescent protein (gfp) reporter genes which are derived from coelenterates and 

10 are commonly used in plant and mammalian cells. To achieve sensitive 

quantitation of reporter gene expression, the activity of the gene product must 
not be endogenous to the experimental host cells. Thus, reporter genes are 
usually selected from organisms having unique and distinctive phenotypes. 
Consequently, these organisms often have widely separated evolutionary 

15 histories from the experimental host cells. 

Previously, to create genes having a more optimal codon usage frequency 
but still encoding the same gene product, a synthetic nucleic acid sequence was 
made by replacing existing codons with codons that were generally more 
favorable to the experimental host cell (see U.S. Patent Nos. 5,096,825, 

20 5,670,356 and 5,874,304.) The result was a net improvement in codon usage 
frequency of the synthetic gene. However, the optimization of other attributes 
was not considered and so these synthetic genes likely did not reflect genes 
optimized by natural selection. 

In particular, improvements in codon usage frequency are intended only 

25 for optimization of a RNA sequence based on its role in translation into a 

protein. Thus, previously described methods did not address how the sequence 
of a synthetic gene affects the role of DNA in transcription into RNA. Most 
notably, consideration had not been given as to how transcription factors may 
interact with the synthetic DNA and consequently modulate or otherwise 

30 influence gene transcription. For genes found in nature, the DNA would be 
optimally transcribed by the native host cell and would yield an RNA that 
encodes a properly folded gene product. In contrast, synthetic genes have 
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previously not been optimized for transcriptional characteristics. Rather, this 
property has been ignored or left to chance. 

This concern is important for all genes, but particularly important for 
reporter genes, which are most commonly used to quantitate transcriptional 
5 behavior in the experimental host cells. Hundreds of transcription factors have 
been identified in different cell types under different physiological conditions, 
and likely more exist but have not yet been identified. All of these transcription 
factors can influence the transcription of an introduced gene. A useful synthetic 
reporter gene of the invention has a minimal risk of influencing or perturbing 

10 intrinsic transcriptional characteristics of the host cell because the structure of 
that gene has been altered. A particularly useful synthetic reporter gene will 
have desirable characteristics under a new set and/or a wide variety of 
experimental conditions. To best achieve these characteristics, the structure of 
the synthetic gene should have minimal potential for interacting with 

1 5 transcription factors within a broad range of host cells and physiological 

conditions. Minimizing potential interactions between a reporter gene and a host 
cell's endogenous transcription factors increases the value of a reporter gene by 
reducing the risk of inappropriate transcriptional characteristics of the gene 
within a particular experiment, increasing applicability of the gene in various 

20 environments, and increasing the acceptance of the resulting experimental data. 

In contrast, a reporter gene comprising a native nucleotide sequence, 
based on a genomic or cDNA clone from the original host organism, may 
interact with transcription factors when expressed in an exogenous host. This 
risk stems from two circumstances. First, the native nucleotide sequence 

25 contains sequences that were optimized through natural selection to influence 
gene transcription within the native host organism. However, these sequences 
might also influence transcription when the gene is expressed in exogenous 
hosts, i.e., out of context, thus interfering with its performance as a reporter gene. 
Second, the nucleotide sequence may inadvertently interact with transcription 

30 factors that were not present in the native host organism, and thus did not 
participate in its natural selection. The probability of such inadvertent 
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interactions increases with greater evolutionary separation between the 
experimental cells and the native organism of the reporter gene. 

These potential interactions with transcription factors would likely be 
disrupted when using a synthetic reporter gene having alterations in codon usage 
5 frequency. However, a synthetic reporter gene sequence, designed by choosing 
codons based only on codon usage frequency, is likely to contain other 
unintended transcription factor binding sites since the synthetic gene has not 
been subjected to the benefit of natural selection to correct inappropriate 
transcriptional activities. Inadvertent interactions with transcription factors 
10 could also occur whenever the encoded amino acid sequence is artificially 
altered, e.g., to introduce amino acid substitutions. Similarly, these changes 
have not been subjected to natural selection, and thus may exhibit undesired 
characteristics. 

Thus, the invention provides a method for preparing synthetic nucleic 

15 acid sequences that reduce the risk of undesirable interactions of the nucleic acid 
with transcription factors when expressed in a particular host cell, thereby 
reducing inappropriate or unintended transcriptional characteristics. Preferably, 
the method yields synthetic genes containing improved codon usage frequencies 
for a particular host cell and with a reduced occurrence of transcription factor 

20 binding sites. The invention also provides a method of preparing synthetic 

genes containing improved codon usage frequencies with a reduced occurrence 
of transcription factor binding sites and additional beneficial structural attributes. 
Such additional attributes include the absence of inappropriate RNA splicing 
junctions, poly(A) addition signals, undesirable restriction sites, ribosomal 

25 binding sites, and secondary structural motifs such as hairpin loops. 

Also provided is a methodfor preparing two synthetic genes encoding 
the same or highly similar proteins ("codon distinct" versions). Preferably, the 
two synthetic genes have a reduced ability to hybridize to a common 
polynucleotide probe sequence, or have a reduced risk of recombining when 

30 present together in living cells. To detect recombination, PCR amplification of 
the reporter sequences using primers complementary to flanking sequences and 
sequencing of the amplified sequences may be employed. 
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To select codons for the synthetic nucleic acid molecules of the 
invention, preferred codons have a relatively high codon usage frequency in a 
selected host cell, and their introduction results in the introduction of relatively 
few transcription factor binding sites, relatively few other undesirable structural 
5 attributes, and optionally a characteristic that distinguishes the synthetic gene 
from another gene encoding a highly similar protein. Thus, the synthetic nucleic 
. acid product obtained by the method of the invention is a synthetic gene with 
improved level of expression due to improved codon usage frequency, a reduced 
risk of inappropriate transcriptional behavior due to a reduced number of 
1 0 undesirable transcription regulatory sequences, and optionally any additional 
characteristic due to other criteria that may be employed to select the synthetic 
sequence. 

The invention may be employed with any nucleic acid sequence, e.g., a 
native sequence such as a cDNA or one which has been manipulated in vitro, 

1 5 e.g., to introduce specific alterations such as the introduction or removal of a 

restriction enzyme recognition site, the alteration of a codon to encode a different 
amino acid or to encode a fusion protein, or to alter GC or AT content (% of 
composition) of nucleic acid molecules. Moreover, the method of the invention 
is useful with any gene, but particularly useful for reporter genes as well as other 

20 genes associated with the expression of reporter genes, such as selectable 
markers. Preferred genes include, but are not limited to, those encoding 
lactamase (P-gal), neomycin resistance (Neo), CAT, GUS, galactopyranoside, 
GFP, xylosidase, thymidine kinase, arabinosidase and the like. As used herein, a 
"marker gene" or "reporter gene" is a gene that imparts a distinct phenotype to 

25 cells expressing the gene and thus permits cells having the gene to be 

distinguished from cells that do not have the gene. Such genes may encode 
either a selectable or screenable marker, depending on whether the marker 
confers a trait which one can 'select 5 for by chemical means, i.e., through the use 
of a selective agent (e.g., a herbicide, antibiotic, or the like), or whether it is 

30 simply a "reporter" trait that one can identify through observation or testing, i.e., 
by Screening'. Elements of the present disclosure are exemplified in detail 
through the use of particular marker genes. Of course, many examples of 
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suitable marker genes or reporter genes are known to the art and can be 
employed in the practice of the invention. Therefore, it will be understood that 
the following discussion is exemplary rather than exhaustive. In light of the 
techniques disclosed herein and the general recombinant techniques which are 
5 known in the art, the present invention renders possible the alteration of any 
gene. 

Exemplary marker genes include, but are not limited to, a neo gene, a 0- 
gal gene, a gus gene, a cat gene, a gpt gene, a hyg gene, a hisD gene, a ble gene, 
a mprt gene, a bar gene, a nitrilase gene, a mutant acetolactate synthase gene 

10 (ALS) or acetoacid synthase gene (AAS), a methotrexate-resistant dhfr gene, a 
dalapon dehalogenase gene, a mutated anthranilate synthase gene that confers 
resistance to 5-methyl tryptophan (WO 97/26366), an R-locus gene, a P- 
lactamase gene, a xylB gene, an a-amylase gene, a tyrosinase gene, a luciferase 
Que) gene, (e.g., zRenilla reniforniis luciferase gene, a firefly luciferase gene, or 

15 a click beetle luciferase (Pyrophorus plagiophthalamus) gene), an aequorin gene, 
or a green fluorescent protein gene. Included within the terms selectable or 
screenable marker genes are also genes which encode a "secretable marker" 
whose secretion can be detected as a means of identifying or selecting for 
transformed cells. Examples include markers which encode a secretable antigen 

20 that can be identified by antibody interaction, or even secretable enzymes which 
can be detected by their catalytic activity. Secretable proteins fall into a number 
of classes, including small, diffusible proteins detectable, e.g., by ELISA, and 
proteins that are inserted or trapped in the cell membrane. 

The method of the invention can be performed by, although it is not 

25 limited to, a recursive process. The process includes assigning preferred codons 
to each amino acid in a target molecule, e.g., a native nucleotide sequence, based 
on codon usage in a particular species, identifying potential transcription 
regulatory sequences such as transcription factor binding sites in the nucleic acid 
sequence having preferred codons, e.g., using a database of such binding sites, 

30 optionally identifying other undesirable sequences, and substituting an 
alternative codon (i.e., encoding the same amino acid) at positions where 
undesirable transcription factor binding sites or other sequences occur. For 



WO 02/16944 



37 



PCT7US01/26566 



codon distinct versions, alternative preferred codons are substituted in each 
version. If necessary, the identification and elimination of potential transcription 
factor or other undesirable sequences can be repeated until a nucleotide sequence 
is achieved containing a maximum number of preferred codons and a minimum 
5 number of undesired sequences including transcription regulatory sequences or 
other undesirable sequences. Also, optionally, desired sequences, e.g., 
restriction enzyme recognition sites, can be introduced. After a synthetic nucleic 
acid molecule is designed and constructed, its properties relative to the parent 
nucleic acid sequence can be determined by methods well known to the art. For 

10 example, the expression of the synthetic and target nucleic acid molecules in a 
series of vectors in a particular cell can be compared. 

Thus, generally, the method of the invention comprises identifying a 
target nucleic acid sequence, such as a vector backbone, a reporter gene or a 
selectable marker gene, and a host cell of interest, for example, a plant (dicot or 

1 5 monocot), fungus, yeast or mammalian cell. Preferred host cells are mammalian 
host cells such as CHO, COS, 293, Hela, CV-1 and NIH3T3 cells. Based on 
preferred codon usage in the host cell(s) and, optionally, low codon usage in the 
host cell(s), e.g., high usage mammalian codons and low usage E. coli and 
mammalian codons, codons to be replaced are determined. For codon distinct 

20 versions of two synthetic nucleic acid molecules, alternative preferred codons are 
introduced to each version. Thus, for amino acids having more than two codons, 
one preferred codon is introduced to one version and another preferred codon is 
introduced to the other version. For amino acids having six codons, the two 
codons with the largest number of mismatched bases are identified and one is 

25 introduced to one version and the other codon is introduced to the other version. 
Concurrent, subsequent or prior to selecting codons to be replaced, desired and 
undesired sequences, such as undesired transcriptional regulatory sequences, in 
the target sequence are identified. These sequences can be identified using 
databases and software such as EPD, NNPD, REBASE, TRANSFAC, TESS, 

30 GenePro, MAR fwww.ncgr.org/MAR-search) and BCM Gene Finder, further 
described herein. After the sequences are identified, the modification(s) are 
introduced. Once a desired synthetic nucleic acid sequence is obtained, it can be 
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prepared by methods well known to the art (such as PCR with overlapping 
primers), and its structural and functional properties compared to the target 
nucleic acid sequence, including, but not limited to, percent homology, presence 
or absence of certain sequences, for example, restriction sites, percent of codons 
5 changed (such as an increased or decreased usage of certain codons) and 
expression rates. 

As described below, the method was used to create synthetic reporter 
genes encoding Renilla reniformis luciferase, and two click beetle luciferases 
(one emitting green light and the other emitting red light). For both systems, the 

10 synthetic genes support much greater levels of expression than the corresponding 
native or parent genes for the protein. In addition, the native and parent genes 
demonstrated anomalous transcription characteristics when expressed in 
mammalian cells, which were not evident in the synthetic genes. In particular, 
basal expression of the native or parent genes is relatively high. Furthermore, 

1 5 the expression is induced to very high levels by an enhancer sequence in the 

absence of known promoters. The synthetic genes show lower basal expression 
and do not show the anomalous enhancer behavior. Presumably, the enhancer is 
activating transcriptional elements found in the native genes that are absent in 
the synthetic genes. The results clearly show that the synthetic nucleic acid 

20 sequences exhibit superior performance as reporter genes. 

Exemplary Uses of the Molecules of the Invention 

The synthetic genes of the invention preferably encode the same proteins 
as their native counterpart (or nearly so), but have improved codon usage while 

25 being largely devoid of known transcription regulatory elements in the coding 
region. (It is recognized that a small number of amino acid changes may be 
desired to enhance a property of the native counterpart protein, e.g. to enhance 
luminescence of a luciferase.) This increases the level of expression of the 
protein the synthetic gene encodes and reduces the risk of anomalous expression 

30 of the protein. For example, studies of many important events of gene 
regulation, which may be mediated by weak promoters, are limited by 
insufficient reporter signals from inadequate expression of the reporter proteins. 
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The synthetic luciferase genes described herein permit detection of weak 
promoter activity because of the large increase in level of expression, which 
enables increased detection sensitivity. Also, the use of some selectable markers 
may be limited by the expression of that marker in an exogenous cell. Thus, 
5 synthetic selectable marker genes which have improved codon usage for that 
cell, and have a decrease in other undesirable sequences, (e.g., transcription 
factor binding sites), can permit the use of those markers in cells that otherwise 
were undesirable as hosts for those markers. 

Promoter crosstalk is another concern when a co-reporter gene is used to 

10 normalize transfection efficiencies. With the enhanced expression of synthetic 
genes, the amount of DNA containing strong promoters can be reduced, or DNA 
containing weaker promoters can be employed, to drive the expression of the co- 
reporter. In addition, there may be a reduction in the background expression 
from the synthetic reporter genes of the invention. This characteristic makes 

15 synthetic reporter genes more desirable by minimizing the sporadic expression 
from the genes and reducing the interference resulting from other regulatory 
pathways. 

The use of reporter genes in imaging systems, which can be used for in 
vivo biological studies or drug screening, is another use for the synthetic genes of 

20 the invention. Due to their increased level of expression, the protein encoded by 
a synthetic gene is more readily detectable by an imaging system. In fact, using 
a synthetic Renilla luciferase gene, luminescence in transfected CHO cells was 
detected visually without the aid of instrumentation. 

In addition, the synthetic genes may be used to express fusion proteins, 

25 for example fusions with secretion leader sequences or cellular localization 
sequences, to study transcription in difficult-to-transfect cells such as primary 
cells, and/or to improve the analysis of regulatory pathways and genetic 
elements. Other uses include, but are not limited to, the detection of rare events 
that require extreme sensitivity (e.g., studying RNA recoding), use with IRES, to 

30 improve the efficiency of in vitro translation or in vitro transcription-translation 
coupled systems such as TNT (Promega Corp., Madison, WI), study of reporters 
optimized to different host organisms (e.g., plants, fungus, and the like), use of 
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multiple genes as co-reporters to monitor drug toxicity, as reporter molecules in 
multiwell assays, and as reporter molecules in drug screening with the advantage 
of minimi Ting possible interference of reporter signal by different signal 
transduction pathways and other regulatory mechanisms. 
5 Additionally, uses for the nucleic acid molecules of the invention include 

fluorescence activated cell sorting (FACS), fluorescent microscopy, to detect 
and/or measure the level of gene expression in vitro and in vivo, (e.g., to 
determine promoter strength), subcellular localization or targeting (fusion 
protein), as a marker, in calibration, in a kit, (e.g., for dual assays), for in vivo 
10 imaging, to analyze regulatory pathways and genetic elements, and in multi-well 
formats. 

With respect to synthetic DNA encoding luciferases, the use of synthetic 
click beetle luciferases provides advantages such as the measurement of dual 
reporters. As Renilla luciferase is better suited for in vivo imaging (because it 
1 5 does not depend on ATP or Mg 2+ for reaction, unlike firefly luciferase, and 

because coelenterazine is more permeable to the cell membrane than luciferin), 
the synthetic Renilla luciferase gene can be employed in vivo. Further, the 
synthetic Renilla luciferase has improved fidelity and sensitivity in dual 
luciferase assays, e.g., for biological analysis or in drug screening platform. 

20 

Demonstration of the Invention Using Luciferase Genes 

The reporter genes for click beetle luciferase and Renilla luciferase were 
used to demonstrate the invention because the reaction catalyzed by the protein 
they encode are significantly easier to quantify than the product of most genes. 
25 However, for the purposes of demonstrating the present invention they represent 
genes in general. 

Although the click beetle luciferase and Renilla luciferase genes share the 
name "luciferase", this should not be interpreted to mean that they originate from 
the same family of genes. The two luciferase proteins are evolutionarily 
30 distinct; they have fundamentally different traits and physical structures, they use 
vastly different substrates (Figure 17), and they evolved from completely 
different families of genes. The click beetle luciferase is 61 kD in size, uses 
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luciferin as a substrate and evolved from the CoA synthetases. The Renilla 
luciferase originates from the sea pansy Renilla Reniformis, is 35 kD in size, 
uses coelenterazine as a substrate and evolved from the a(5 hydrolases. The only 
shared trait of these two enzymes is that the reaction they catalyze results in light 
5 output. They are no more similar for resulting in light output than any other two 
enzymes would be, for example, simply because the reaction they catalyze 
results in heat. 

Bioluminescence is the light produced in certain organisms as a result of 
luciferase-mediated oxidation reactions. The luciferase genes, e.g., the genes 

10 from luminous beetles, sea pansy, and, in particular, the luciferase from Photinus 
pyralis (the common firefly of North America), are currently the most popular 
luminescent reporter genes. Reference is made to Bronstein et al. (1994) for a 
review of luminescent reporter gene assays and to Wood (1995) for a review of 
the evolution of beetle bioluminescence. See Figure 17 for an illustration of the 

1 5 reactions catalyzed by each of firefly and click beetle luciferases (17A) and 
Renilla luciferase (17B). 

Firefly luciferase and Renilla luciferase are highly valuable as genetic 
reporters due to the convenience, sensitivity and linear range of the luminescence 
assay, Today, luciferase is used in virtually every type of experimental 

20 biological system, including, but not limited to, prokaryotic and eukaryotic cell 
culture, transgenic plants and animals, and cell-free expression systems. The 
firefly luciferase enzyme is derived from a specific North American beetle, 
Photinus pyralis. The firefly luciferase enzyme and the click beetle luciferase 
enzyme are monomeric proteins (61 kDa) which generate light through 

25 monooxygenation of beetle luciferin utilizing ATP and O2 (Figure 17A). The 
Renilla luciferase is derived from the sea pansy Renilla reniformis. The Renilla 
luciferase enzyme is a 36 kDa monomeric protein that utilizes O2 and 
coelenterazine to generate light (Figure 17B). 

The gene encoding firefly luciferase was cloned from Photinus pyralis, 

30 and demonstrated to produce active enzyme in E. coli (de Wet et al, 1987). The 
cDNA encoding firefly luciferase Que) continues to gain favor as the gene of 
choice for reporting genetic activity in animal, plant and microbial cells. The 
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firefly luciferase reaction, modified by the addition of Co A to produce persistent 
light emission, provides an extremely sensitive and rapid in vitro assay for 
quantifying firefly luciferase expression in small samples of transfected cells or 
tissues. 

5 To use firefly luciferase or click beetle luciferase as a genetic reporter, 

extracts of cells expressing the luciferase are mixed with substrates (beetle 
luciferin, Mg 2+ ATP, and O2), and luminescence is measured immediately, The 
assay is very rapid and sensitive, providing gene expression data with little 
effort. The conventional firefly luciferase assay has been further improved by 

1 0 including coenzyme A in the assay reagent to yield greater enzyme turnover and 
thus greater luminescence intensity (Promega Luciferase Assay Reagent, Cat# 
E1500, Promega Corporation, Madison, Wis.). Using this reagent, luciferase 
activity can be readily measured in luminometers or scintillation counters. 
Firefly and click beetle luciferase activity can also be detected in living cells in 

15 culture by adding luciferin to the growth medium. This in situ luminescence 
relies on the ability of beetle luciferin to diffuse through cellular and 
peroxisomal membranes and on the intracellular availability of ATP and O2 in 
the cytosol and peroxisome. 

Further, although reporter genes are widely used to measure transcription 

20 events, their utility can be limited by the fidelity and efficiency of reporter 

expression. For example, in U.S. Patent No. 5,670,356, a firefly luciferase gene 
(referred to as luc+) was modified to improve the level of luciferase expression. 
While a higher level of expression was observed, it was not determined that 
higher expression had improved regulatory control. 

25 The invention will be further described by the following nonlimiting 

examples. 

Example 1 

Synthetic Click Beetle fRD and GR) Luciferase Nucleic Acid Molecules 
30 LucPplYG is a wild-type click beetle luciferase that emits yellow-green 

luminescence (Wood, 1989). A mutant of UxcPplYG named YG#81-6G01 was 
envisioned. YG#81-6G01 lacks a peroxisome targeting signal, has a lower Km 
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for luciferin and ATP, has increased signal stability and increased temperature 
stability when compared to the wild type (PCT7W099 14336). YG #81-6G01 
was mutated to emit green luminescence by changing Ala at position 224 to Val 
(A224V is a green-shifting mutation), or to emit red luminescence by 
5 simultaneously introducing the amino acid substitutions A224H, S247H, N346I, 
and H348Q (red-shifting mutation set) (PCT/W095 18853) 

Using YG #81-6G01 as a parent gene, two synthetic gene sequences were 
designed. One codes for a luciferase emitting green luminescence (GR) and one 
for a luciferase emitting red luminescence (RD). Both genes were designed to 1) 

1 0 have optimized codon usage for expression in mammalian cells, 2) have a 
reduced number of transcriptional regulatory sites including mammalian 
transcription factor binding sites, splice sites, poly(A) addition sites and 
promoters, as well as prokaryotic (E. coli) regulatory sites, 3) be devoid of 
unwanted restriction sites, e.g., those which are likely to interfere with standard 

1 5 cloning procedures, and 4) have a low DNA sequence identity compared to each 
other in order to minimize genetic rearrangements when both are present inside 
the same cell. In addition, desired sequences, e.g., a Kozak sequence or 
restriction enzyme recognition sites, may be identified and introduced. 

Not all design criteria could be met equally well at the same time. The 

20 following priority was established for reduction of transcriptional regulatory 
sites: elimination of transcription factor (TF) binding sites received the highest 
priority, followed by elimination of splice sites and poly(A) addition sites, and 
finally prokaryotic regulatory sites. When removing regulatory sites, the strategy 
was to work from the lesser important to the most important to ensure that the 

25 most important changes were made last. Then the sequence was rechecked for 
the appearance of new lower priority sites and additional changes made as 
needed. Thus, the process for designing the synthetic GR and RD gene 
sequences, using computer programs described herein, involved 5 optionally 
iterative steps that are detailed below 

30 1 . Optimized codon usage and changed A224 V to create GRverL 

separately changed A224H, S247H, H348Q and N346I to create 
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RDverl. These particular amino acid changes were maintained 
throughout all subsequent manipulations to the sequence. 

2. Removed undesired restriction sites, prokaryotic regulatory sites, 
splice 

5 sites, poly(A) sites thereby creating GRver2 and RDver2 . 

3. Removed transcription factor binding sites (first pass) and removed 
any 

newly created undesired sites as listed in step 2 above thereby 
creating 

10 GRver3 and RDver3 . 

4. Removed transcription factor binding sites created by step 3 above 
(second pass) and removed any newly created undesired sites as listed 
in step 2 above thereby creating GRver4 and RDver4 . 

5. Removed transcription factor binding sites created by step 4 above 
15 (third 

Pass) and confirmed absence of sites listed in step 2 above thereby 
creating GRverS and RDverS . 

6. Constructed the actual genes by PCR using synthetic oligonucleotides 
corresponding to fragments of GRverS and RDverS designed 

20 sequences (Figures 6 and 10) thereby creating GR6 and RD7. GR6, 

upon sequencing was found to have the serine residue at amino acid 
position 49 mutated to an asparagine and the proline at amino acid 
position 230 mutated to a serine (S49N, P230S). RD7, upon 
sequencing was found to have the histidine at amino acid position 36 

25 mutated to a tyrosine (H36Y). These changes occurred during the 

PCR process. 

7. The mutations described in step 6 above (S49N, P230S for GR6 and 
H36Y for RD7) were reversed to create GRverS. 1 and RDverS. 1 . 

8. RDverS. 1 was further modified by changing the arginine codon at 
30 position 35 1 to a glycine codon (R35 1 G) thereby creating RDver5.2 

with improved spectral properties compared to RDverS. 1. 
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9. RDver5.2 was further mutated to increase luminescence intensity 
thereby creating RD156-1H9 which encodes four additional amino 
acid changes (M2I, S349T, K488T, E538V) and three silent single 
base changes (SEQ ID NO:18). 

5 

1. Optimize codon usage and introduce mutations determining luminescence 
color 

The starting gene sequence for this design step was YG #81-6G01 (SEQ ID 
NO:2). 

10 a) Optimize codon usage: 

The strategy was to adapt the codon usage for optimal expression in 
human cells and at the same time to avoid E. coli low-usage codons. Based on 
these requirements, the best two codons for expression in human cells for all 
amino acids with more than two codons were selected (see Wada et al., 1990). 
15 In the selection of codon pairs for amino acids with six codons, the selection was 
biased towards pairs that have the largest number of mismatched bases to allow 
design of GR and RD genes with minimum sequence identity (codon 
distinction): 

Arg: CGC/CGT Leu: CTG/TTG Ser: TCT/AGC 
20 Thr: ACC/ACT Pro: CCA/CCT Ala: GCC/GCT 

Gly: GGC/GGT Val: GTC/GTG He: ATC/ATT 
Based on this selection of codons, two gene sequences encoding the YG#81- 
6G01 luciferase protein sequence were computer generated. The two genes were 
designed to have minimum DNA sequence identity and at the same time closely 
25 similar codon usage. To achieve this, each codon in the two genes was replaced 
by a codon from the limited list described above in an alternating fashion (e.g., 
Arg (n ) is CGC in gene 1 and CGT in gene 2, Aig( nH) is CGT in gene 1 and CGC 
in gene 2). 

For subsequent steps in the design process it was anticipated that changes 
30 had to be made to this limited optimal codon selection in order to meet other 
design criteria, however, the following low-usage codons in mammalian cells 
were not used unless needed to meet criteria of higher priority: 
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Arg: CGA Leu: CTA Ser: TCG 

Pro: CCG Val: GTA lie: ATA 
Also, the following low-usage codons in E. coli were avoided when reasonable 
(note that 3 of these match the low-usage list for mammalian cells): 

Arg: CGA/CGG/AGA/AGG 

Leu: CTA Pro: CCC He: ATA 



b) Introduce mutations determining luminescence color: 

Into one of the two codon-optimized gene sequences was introduced the 
10 single green-shifting mutation and into the other were introduced the 4 red- 
shifting mutations as described above. 

The two output sequences from this first design step were named GRverl 
(version 1 GR) and RDverl (version 1 RD). Their DNA sequences are 63% 
identical (594 mismatches), while the proteins they encode differ only by the 4 
15 amino acids that determine luminescence color (see Figures 2 and 3 for an 
alignment of the DNA and protein sequences). 

Tables 1 and 2 show, as an example, the codon usage for valine and 
leucine in human genes, the parent gene YG#81-6G01, the codon-optimized 
synthetic genes GRverl and RDverl, as well as the final versions of the 
20 synthetic genes after completion of step 5 in the design process (GRverS and 
RDverS). For a complete summary of the codon changes, see Figures 4 and 5. 
Table 1: Valine 



Codon 


Human 


Parent 


GRverl 


RDverl 


GTA 


4 


13 


0 


0 


GTC 


13 


4 


25 


24 


GTG 


24 


12 


25 


25 


GTT 


9 


20 


0 


0 



Table 2: Leucine 



Codon 


Human 


Parent 


GRverl 


RDverl 


CTA 


3 


5 


0 


0 


CTC 


12 


4 


0 


1 


CTG 


24 


4 


28 


27 



GR ver5 


RDver5 


1 


1 


21 


26 


25 


17 


3 


5 




GR ver5 


RDver5 


0 


0 


12 


11 


19 


18 
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CTT 


6 


12 


0 


0 


TTA 


3 


17 


0 


0 


TTG 


6 


13 


27 


27 



1 


1 


0 


0 


23 


25 



2. Remove undesired restriction sites, prokarvotic regulatory sites, splice sites 
and polv(A') addition sites 

The starting gene sequences for this design step were GRverl and RDverl . 
5 a) Remove undesired restriction sites: 

To check for the presence and location of undesired restriction sites, the 
sequences of both synthetic genes were compared against a database of 
restriction enzyme recognition sequences (REBASE ver.712, 
http://www.neb.com/rebase) using standard sequence analysis software 
1 0 (GenePro ver 6.10, Riverside Scientific Ent.). 

Specifically, the following restriction enzymes were classified as undesired: 

- BainR I, Xho I, Sfi I, Kpn I, Sac I, Mlu I, Nhe I, Sma I, Xho I, Bgl II, 
Hind HI, Nco I, Nar I, Xba I, Hpa I, Sal I, 

other cloning sites commonly used: EcdK I , EcoR V, Cla I, 
15 - eight-base cutters (commonly used for complex constructs), 

- BstE II (to allow N-terminal fusions), 

- Xcm I (can generate A/T overhang used for T- vector cloning). 

To eliminate undesired restriction sites when found in a synthetic gene, one or 
more codons of the synthetic gene sequence were altered in accordance with the 

20 codon optimization guidelines described in 1 a above. 

b) Remove prokaryotic (E. coli) regulatory sequences: 

To check for the presence and location of prokaryotic regulatory 
sequences, the sequences of both synthetic genes were searched for the presence 
of the following consensus sequences using standard sequence analysis software 

25 (GenePro): 

- TATAAT (-10 Pribnow box of promoter) 

- AGGA or GGAG (ribosome binding site; only considered if paired 
with a methionine codon 12 or fewer bases downstream). 
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To eliminate such regulatory sequences when found in a synthetic gene, one or 
more codons of the synthetic gene at sequence were altered in accordance with 
the codon optimization guidelines described in la above, 
c) Remove splice sites: 
5 To check for the presence and location of splice sites, the DNA strand 

corresponding to the primary RNA transcript of each synthetic gene was 
searched for the presence of the following consensus sequences (see Watson et 
al., 1983) using standard sequence analysis software (GenePro): 

- splice donor site: AG I GTRAGT (exon | intron), the search was 

1 0 performed for AGGTRAG and the lower stringency GGTRAGT; 

- splice acceptor site: (Y) n NCAG | G (intron | exon), the search was 
performed with n = 1. 

To eliminate splice sites found in a synthetic gene, one or more codons of the 
synthetic gene sequence were altered in accordance with the codon optimization 

15 guidelines described in la above. Splice acceptor sites were generally difficult 
to eliminate in one gene without introducing them into the other gene because 
they tended to contain one of the two only. Gin codons (CAG); they were 
removed by placing the Gin codon CAA in both genes at the expense of a 
slightly increased sequence identity between the two genes. 

20 d) Remove poly (A) addition sites: 

To check for the presence and location of poly(A) addition sites, the 
sequences of both synthetic genes were searched for the presence of the 
following consensus sequence using standard sequence analysis software 
(GenePro): 

25 - AATAAA. 

To eliminate each poly(A) addition site found in a synthetic gene, one or more 
codons of the synthetic gene sequence were altered in accordance with the codon 
optimization guidelines described in la above. The two output sequences from 
this second design step were named GRver2 and RDver2. Their DNA sequences 

30 are 63% identical (590 mismatches) (Figs. 2 and 3). 

3. Remove transcription factor fTF) binding sites, then repeat steps 2 a-d 
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The starting gene sequences for this design step were GRver2 and 
RDver2. 

To check for the presence, location and identity of potential TF binding sites, the 
sequences of both synthetic genes were used as query sequences to search a 
5 database of transcription factor binding sites (TRANSFAC v3.2). The 

TRANSFAC database (http://transfac.gbf.de/TRANSFAC/index:htmn holds 
information on gene regulatory DNA sequences (TF binding sites) and proteins 
(TFs) that bind to and act through them. The SITE table of TRANSFAC Release 
3.2 contains 4,401 entries of individual (putative) TF binding sites (including TF 

10 binding sites in eukaryotic genes, in artificial sequences resulting from 
mutagenesis studies and in vitro selection procedures based on random 
oligonucleotide mixtures or specific theoretical considerations, and consensus 
binding sequences (from Faisst and Meyer, 1992)). 

The software tool used to locate and display these TF binding sites in the 

15 synthetic gene sequences was TESS (Transcription Element Search Software, 
http://agave.humgen.uperm.edu/tess/index.htmn . The filtered string-based 
search option was used with the following user-defined search parameters: 

- Factor Selection Attribute: Organism Classification 

- Search Pattern: Mammalia 

20 - Max. Allowable Mismatch %: 0 

- Min. element length: 5 

- Min. log-likelihood: 10 

This parameter selection specifies that only mammalian TF binding sites 
(approximately 1,400 of the 4,401 entries in the database) that are at least 5 bases 

25 long will be included in the search. It further specifies that only TF binding sites 
that have a perfect match in the query sequence and a minimum log likelihood 
(LLH) score of 10 will be reported. The LLH scoring method assigns 2 to an 
unambiguous match, 1 to a partially ambiguous match (e.g., A or T match W) 
and 0 to a match against C N'. For example, a search with parameters specified 

30 above would result in a "hit" (positive result or match) for TATAA (SEQ ID 
NO:240) (LLH = 10), STRATG (SEQ ID NO:241) (LLH = 10), and 
MTTNCNNMA (SEQ ID NO:242) (LLH = 10) but not for TRATG (SEQ ID 
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NO: 243) (LLH = 9) if these four TF binding sites were present in the query 
sequence. A lower stringency test was performed at the end of the design 
process to re-evaluate the search parameters. 

When TESS was tested with a mock query sequence containing known 
5 TF binding sites it was found that the program was unable to report matches to 
sites ending with the 3' end of the query sequence. Thus, an extra nucleotide 
was added to the 3' end of all query sequences to eliminate this problem. 

The first search for TF binding sites using the parameters described 
above found about 100 transcription factor binding sites (hits) for each of the 

10 two synthetic genes (GRver2 and RDver2). All sites were eliminated by 

changing one or more codons of the synthetic gene sequences in accordance with 
the codon optimization guidelines described in la above. However, it was 
expected that some these changes created new TF binding sites, other regulatory 
sites, and new restriction sites. Thus, steps 2 a-d were repeated as described, and 

15 4 new restriction sites and 2 new splice sites were removed. The two output 
sequences from this third design step were named GRver3 and RDver3. Their 
DNA sequences are 66% identical (541 mismatches) (Figs. 2 and 3). 

4. Remove new transcription factor (TF) binding sites, then repeat steps 2 a-d 
20 The starting gene sequences for this design step were GRver3 and 

RDver3. 

This fourth step is an iteration of the process described in step 3. The search for 
newly introduced TF binding sites yielded about 50 hits for each of the two 
synthetic genes. All sites were eliminated by changing one or more codons of 

25 the synthetic gene sequences in general accordance with the codon optimization 
guidelines described in la above. However, more high to medium usage codons 
were used to allow elimination of all TF binding sites. The lowest priority was 
placed on maintaining low sequence identity between the GR and RD genes. 
Then steps 2 a-d were repeated as described. The two output sequences from 

30 this fourth design step were named GRver4 and RDver4. Their DNA sequences 
. are 68% identical (506 mismatches) (Figs 2 and 3). 
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5. Remove new transcription factor (TFl binding sites, then repeat steps 2 a-d 

The starting gene sequences for this design step were GRver4 and 
RDver4. 

This fifth step is another iteration of the process described in step 3 above. The 
5 search for new TF binding sites introduced in step 4 yielded about 20 hits for 
each of the two synthetic genes. All sites were eliminated by changing one or 
more codons of the synthetic gene sequences in general accordance with the 
codon optimization guidelines described in la above. However, more high to 
medium usage codons were used (these are all considered "preferred") to allow 

1 0. elimination of all TF binding sites. The lowest priority was placed on 

maintaining low sequence identity between the GR and RD genes. Then steps 2 
a-d were repeated as described. Only one acceptor splice site could not be 
eliminated. As a final step the absence of all TF binding sites in both genes as 
specified in step 3 was confirmed. The two output sequences from this fifth and 

15 last design step were named GRverS and RDverS. Their DNA sequences are 
69% identical (504 mismatches) (Figs. 2 and 3). 

Additional evaluation of GRver5 and RDver5 

a) Use lower stringency parameters for TESS: 

20 The search for TF binding sites was repeated as described in step 3 above, but 
with even less stringent user-defined parameters: 

- setting LLH to 9 instead of 10 did not result in new hits; 

- setting LLH to 0 through 8 (incl.) resulted in hits for two additional 
sites, MAMAG (22 hits) and CTKTK (24 hits); 

25 - setting LLH to 8 and the minimum element length to 4, the search 

yielded (in addition to the two sites above) different 4-base sites for 
AP-1, NF-1, and c-Myb that are shortened versions of their longer 
respective consensus sites which were eliminated in steps 3-5 above. 
It was not realistic to attempt complete elimination of these sites without 

30 introduction of new sites, so no further changes were made. 

b) Search different database: 
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The Eukaryotic Promoter Database (release 45) contains information about 
reliably mapped transcription start sites (1253 sequences) of eukaryotic genes. 
This database was searched using BLASTN 1.4.1 1 with default parameters 
(optimized to find nearly identical sequences rapidly; see Altschul et al, 1990) at 
5 the National Center for Biotechnology Information site 

(http://www.ncbi.nlm.nih. gov/cgi-bin/BLAST) . To test this approach, a portion 
of pGL3-Control vector sequence containing the SV40 promoter and enhancer 
was used as a query sequence, yielding the expected hits to SV40 sequences. No 
hits were found when using the two synthetic genes as query sequences. 

10 

Summary of GRverS and RDver5 synthetic gene properties 

Both genes, which at this stage were still only "virtual" sequences in the 

computer, have a codon usage that strongly favors mammalian high-usage 

codons and minimizes mammalian and E. coli low-usage codons. Figure 4 
1 5 shows a summary of the codon usage of the parent gene and the various 

synthetic gene versions. 

Both genes are also completely devoid of eukaryotic TF binding sites 

consisting of more than four unambiguous bases, donor and acceptor splice sites 

(one exception: GRverS contains one splice acceptor site), poly(A) addition sites, 
20 specific prokaryotic {E. coli) regulatory sequences, and undesired restriction 

sites. 

The gene sequence identity between GRverS and RDver5 is only 69% 
(504 base mismatches) while their encoded proteins are 99% identical (4 amino 
acid mismatches), see Figures 2 and 3. Their identity with the parent sequence 
25 YG#81-6G1 is 74% (GRverS) and 73% (RDverS), see Figure 2. Their base 
composition is 49.9% GC (GRverS) and 49.5% GC (RDverS), compared to 
40.2% GC for the parent YG#81r6G01. 

Construction of synthetic eenes 
30 The two synthetic genes were constructed by assembly from synthetic 

oligonucleotides in a thermocycler followed by PCR amplification of the full- 
length genes (similar to Stemmer et al. (1995) Gene. 164, pp. 49-53). 
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Unintended mutations that interfered with the design goals of the synthetic genes 
were corrected. 

a) Design of synthetic oligonucleotides: 

5 The synthetic oligonucleotides were mostly 40mers that collectively code 

for both complete strands of each designed gene (1,626 bp) plus flanking regions 
needed for cloning (1,950 bp total for each gene; Figure 6). The 5' and 3 ! 
boundaries of all oligonucleotides specifying one strand were generally placed in 
a manner to give an average offset/overlap of 20 bases relative to the boundaries 

10 of the oligonucleotides specifying the opposite strand. 

The ends of the flanking regions of both genes matched the ends of the 
amplification primers (pRAMtailup: 5'-gtactgagac gacgccagcccaagcttaggcctgagtg 
SEQ ID NO:229, and pRAMtaildn: 5'-ggcatgagcgt gaactgactgaactagcggccgccgag 
SEQ ID NO:230) to allow cloning of the genes into our E. coli expression vector 

15 pRAM (W099/14336). 

A total of 183 oligonucleotides were designed (Figure 6): fifteen 
oligonucleotides that collectively encode the upstream and downstream flanking 
sequences (identical for both genes; SEQ ID NOs: 35-49) and 168 
oligonucleotides (4 x 42) that encode both strands of the two genes (SEQ ID 

20 NOs: 50-217). 

All 183 oligonucleotides were run through the hairpin analysis of the 
OLIGO software (OLIGO 4.0 Primer Analysis Software © 1989-1991 by 
Wojciech Rychlik) to identify potentially detrimental intra-molecular loop 
formation. The guidelines for evaluating the analysis results were set according 

25 to recommendations of Dr. Sims (Sigma-Genosys Custom Gene Synthesis 

Department): oligos forming hairpins with AG < -10 have to be avoided, those 
forming hairpins with AG < -7 involving the 3* end of the oligonucleotide should 
also be avoided, while those with an overall AG < -5 should not pose a problem 
for this application. The analysis identified 23 oligonucleotides able to form 

30 hairpins with a AG between -7. 1 and -4.9. Of these, 5 had blocked or nearly 
blocked 3' ends (0-3 free bases) and were re-designed by removing 1-4 bases at 
their 3* end and adding it to the adjacent oligonucleotide. 
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The 40mer oligonucleotide covering the sequence complementary to the 
poly(A) tail had a very low complexity 3' end (13 consecutive T bases). An 
additional 40mer was designed with a high complexity 3' end but a consequently 
reduced overlap with one of its complementary oligonucleotides (11 instead of 
5 20 bases) on the opposite strand. 

Even though the oligos were designed for use in a thermocycler-based 
assembly reaction, they could also be used in a ligation-based protocol for gene 
construction. In this approach, the oligonucleotides are annealed in a pairwise 
fashion and the resulting short double-stranded fragments are ligated using the 
10 sticky overhangs. However, this would require that all oligonucleotides be 
phosphorylated. 

b) Gene assembly and amplification 

In a first step, each of the two synthetic genes was assembled in a 
15 separate reaction from 98 oligonucleotides. The total volume for each reaction 
was 50 [xl: 



20 



0.5 oligonucleotides (= 0.25 pmoles of each oligo) 

1.0 U Tag DNA polymerase 

0.02 U Pfu DNA polymerase 

2 mM MgCl 2 

0.2 mM dNTPs (each) 

0.1% gelatin 

Cycling conditions: (94°C for 30 seconds, 52°C for 30 
seconds, and 72°C for 30 seconds) x 55 cycles. 



25 



hi a second step, each assembled synthetic gene was amplified in a 
separate reaction. The total volume for each reaction was 50 \xl: 



30 



2.5 1 assembly reaction 

5.0 U Tag DNA polymerase 

0. 1 U Pfu DNA polymerase 

1 M each primer (pRAMtailup, pRAMtaildn) 

2mMMgCl 2 

0.2 mM dNTPs (each) 
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Cycling conditions: (94°C for 20 seconds, 65°C for 60 
seconds, 72°C for 3 minutes) x 30 cycles. 
The assembled and amplified genes were subcloned into the pRAM 
vector and expressed in E. coli, yielding 1-2% luminescent GR or RD clones. 
5 Five GR and five RD clones were isolated and analyzed further. Of the five GR 
clones, three had the correct insert size, of which one was weakly luminescent 
and one had an altered restriction pattern. Of the five RD clones, two had the 
correct size insert with an altered restriction pattern and one of those was weakly 
luminescent. Overall, the analysis indicated the presence of a large number of 
10 mutations in the genes, most likely the result of errors introduced in the 
assembly and amplification reactions. 

c) Corrective assembly and amplification 

To remove the large number of mutations present in the full-length 
1 5 synthetic genes we performed an additional assembly and amplification reaction 
for each gene using the proof-reading DNA polymerase TIL The assembly 
reaction contained, in addition to the 98 GR or RD oligonucleotides, a small 
amount of DNA from the corresponding full-length clones with mutations 
described above. This allows the oligos to correct mutations present in the 
20 templates. 

The following assembly reaction was performed for each of the synthetic 
genes. The total volume for each reaction was 50 

0.5 pM oligonucleotides (= 0.25 pmoles of each oligo) 
0.016 pmol plasmid (mix of clones with correct insert 
25 size) 

2.5 U Tli DNA polymerase 
2 mM MgCl 2 
0.2 mM dNTPs (each) 
0.1% gelatin 

30 Cycling conditions: 94°C for 30 seconds, then (94°C for 

30 seconds, 52°C for 30 seconds, 72°C for 30 seconds) for 
55 cycles, then 72°C for 5 minutes. 
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The following amplification reaction was performed on each of the 
assembly reactions. The total volume for each amplification reaction was 50 pJ: 
1-5 p.1 of assembly reaction 
40 pmol each primer (pRAMtailup, pRAMtaildn) 
5 2.5 U Tli DNA polymerase 

2mMMgCl 2 
0.2 mM dNTPs (each) 

Cycling conditions: 94°C for 30 seconds, then (94°C for 
20 seconds, 65°C for 60 seconds and 72°C for 3 minutes) 
1 0 for 30 cycles, then 72°C for 5 minutes. 

The genes obtained from the corrective assembly and amplification step 
were subcloned into the pRAM vector and expressed in E. coli, yielding 75% 
luminescent GR or RD clones. Forty-four GR and 44 RD clones were analyzed 
with our screening robot (W099/14336). The six best GR and RD clones were 
15 manually analyzed and one best GR and RD clone was selected (GR6 and RD7). 
Sequence analysis of GR6 revealed two point mutations in the coding region, 
both of which resulted in an amino acid substitution (S49N and P230S). 
Sequence analysis of RD7 revealed three point mutations in the coding region, 
one of which resulted in an amino acid substitution (H36Y). It was confirmed 
20 that none of the silent point mutations introduced any regulatory or restriction 
sites conflicting with the overall design criteria for the synthetic genes. 

d) Reversal of unintended amino acid substitutions 

The unintended amino acid substitutions present in the GR6 and RD7 
25 synthetic genes were reversed by site-directed mutagenesis to match the GRverS 
and RDverS designed sequences, thereby creating GRverS. 1 and RDverS. 1 . The 
DNA sequences of the mutated regions were confirmed by sequence analysis. 

e) Improve spectral properties 

30 The RDverS. 1 gene was further modified to improve its spectral 

properties by introducing an amino change (R351G), thereby creating RDverS .2 
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pGL3 vectors with RD and GR genes 

The parent click beetle luciferase YG#81-6G1 ("YG"), and the synthetic 
click beetle luciferase genes GRverS.l ("GR"), RDver5.2 ("RD"), and RD156- 
1H9 were cloned into the four pGL3 reporter vectors (Promega Corp.): 
5 - pGL3-Basic = no promoter, no enhancer 

- pGL3-Control = S V40 promoter, SV40 enhancef 

- pGL3-Enhancer = SV40 enhancer (3* to luciferase coding sequences) 

- pGL3-Promoter = SV40 promoter. 

The primers employed in the assembly of GR and RD synthetic genes facilitated 
1 0 the cloning of those genes into pRAM vectors. To introduce the genes into 
pGL3 vectors (Promega Corp., Madison, WI) for analysis in mammalian cells, 
each gene in a pRAM vector (pRAM RDverS.l, pRAM GRverS. l, and pRAM 
RD156-1H9) was amplified to introduce an Nco I site at the 5' end and an Xba I 
site at the 3' end of the gene. The primers for pRAM RDverS. 1 and pRAM 
1 5 GRverS . 1 were: 

GR->5' GGA TCC CAT GGT GAA GCG TGA GAA 3' (SEQ ID NO:231) or 
RD-*5' GGA TCC CAT GGT GAA ACG CGA 3' (SEQ ID NO:232) and 
5' CTA GCT TTT TTT TCT AGA TAA TCA TGA AGA C 3' (SEQ ID 
NO:233) 

20 The primers for pRAM RD156-1H9 were: 

5' GCG TAG CCA TGG TAA AGC GTG AGA AAA ATG TC 3' (SEQ ID NO: 
295) and 

5' CCG ACT CTA GAT TAC TAA CCG CCG GCC TTC ACC 3' (SEQ ID 
NO: 296) 
25 The PCR included: 

100 ngDNAplasmid 
1 jiM primer upstream 
1 \\M primer downstream 
0.2 mM dNTPs 
30 IX buffer (Promega Corp.) 

5 units Pfu DNA polymerase (Promega Corp.) 
Sterile nanopure H2O to 50 |il 
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The cycling parameters were: 94°C for 5 minutes; (94°C for 30 seconds; 
55°C for 1 minute; and 72°C for 3 minutes) x 15 cycles. The purified PCR 
product was digested with Nco I mdXba I, ligated with pGL3 -control that was 
also digested with Nco I and Xba I, and the ligated products introduced to E. coli. 
5 To insert the luciferase genes into the other pGL3 reporter vectors (basic, 
promoter and enhancer), the pGL3-control vectors containing each of the 
luciferase genes was digested with Nco I and Xba I, ligated with other pGL3 
vectors that also were digested with Nco I and Xba I, and the ligated products 
introduced to E. coli. Note that the polypeptide encoded by GRverS.l and 

1 0 RDver5 . 1 (and RD 156- 1H9, see below) nucleic acid sequences in pGL3 vectors 
has an amino acid substitution at position 2 to valine as a result of the Nco I site 
at the initiation codon in the oligonucleotide. 

Because of internal Nco I and Xba I sites, the native gene in YG #8 1- 
6G01 was amplified from a Hind HI site upstream to a Hpa I site downstream of 

1 5 the coding region and which included flanking sequences found in the GR and 
RD clones. The upstream primer (5'-CAA AAA GCT TGG CAT TCC GGT 
ACT GTT GGT AAA GCC ACC ATG GTG AAG CGA GAG- 3'; SEQ ID 
NO:234) and a downstream primer (5'- CAA TTG TTG TTG TTA ACT TGT 
TTA TT -3'; SEQ ID NO:235) were mixed with YG#81-6G01 and amplified 

20 using the PCR conditions above. The purified PCR product was digested with 
Nco I and Xba I, ligated with pGL3-control that was also digested with Hind m 
and Hpa I, and the ligated products introduced into E. coli. To insert YG#81- 
6G01 into the other pGL3 reporter vectors (basic, promoter and enhancer), the 
pGL3-control vectors containing YG#81-6G01 were digested with Nco I and 

25 Xba I, ligated with the other pGL3 vectors that also were digested with Nco I and 
Xba I, and the ligated products introduced to E. coli. Note that the clone of 
YG#81-6G01 in the pGL3 vectors has a C instead of an A at base 786, which 
yields a change in the amino acid sequence at residue 262 from Phe to Leu 
(Figure 2 shows the sequence of YG#81-6G01 prior to introduction into pGL3 

30 vectors). To determine whether the altered amino acid at position 262 affected 
the enzyme biochemistry, the clone of YG#81-6G01 was mutated to resemble 
the original sequence. Both clones were then tested for expression in E. coli, 
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physical stability, substrate binding, and luminescence output kinetics. No 
significant differences were found. 

Partially purified enzymes expressed from the synthetic genes and the 
parent gene were employed to determine Km for luciferin and ATP (see Table 
3). 



Table 3 



Enzyme 


Km (LH2) 


Km (ATP) 


YG parent 


2uM 


17 nM 


GR 


1.3 nM 


25 nM 


RD 


24.5 mM 


46 nM 



In vitro eukaryotic transcription/translation reactions were also conducted 
10 using Promega's TNT 17 Quick system according to manufacturer's 

instructions. Luminescence levels were 1 to 37-fold and 1 to 77-fold higher 
(depending on the reaction time) for die synthetic GR and RD genes, 
respectively, compared to the parent gene (corrected for luminometer spectral 
sensitivity). 

15 To test whether the synthetic click beetle luciferase genes and the wild 

type click beetle gene have improved expression in mammalian cells, each of the 
synthetic genes and the parent gene was cloned into a series of pGL3 vectors and 
introduced into CHO cells (Table 8). In all cases, the synthetic click beetle 
genes exhibited a higher expression than the native gene. Specifically, 

20 expression of the synthetic GR and RD genes was 1900-fold and 40-fold higher, 
respectively, than that of the parent (transfection efficiency normalized by 
comparison to native Renilla luciferase gene). Moreover, the data (basic versus 
control vector) show that the synthetic genes have reduced basal level 
transcription. 

25 Further, in experiments with the enhancer vector where the percentage of 

activity in reference to the control is compared between the native and synthetic 
gene, the data showed that the synthetic genes have reduced risk of anomalous 
transcription characteristics. In particular, the parent gene appeared to contain 
one or more internal transcriptional regulatory sequences that are activated by 
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the enhancer in the vector, and thus is not suitable as a reporter gene while the 
synthetic GR and RD genes showed a clean reporter response (transfection 
efficiency normalized by comparison to native Renilla luciferase gene). See 
Table 9. 

5 The clone names and their corresponding SEQ ID numbers for nucleotide 

sequence and amino acid sequence are listed below in Table 4. 

Table 4 

Clone name Luciferase Type SEQ ID NO. SEQ ID NO. 

10 



LUCPPLYG 


Wild type YG Click Beetle 


1 


23 


YG#81-6G01 


Mutant YG Click Beetle 


2 


24 


GRverl 


Synthetic Green Click Beetle 


3 


25 


vjxvverz 


oyntneuc Lrreen uiick Joeeue 


4 


Zo 




oyiiuieuo vjreen v^uck x>eeue 


0 


z / 


frRvpr4 

VJXX V GIT 


Slvntlnptif* f~rrppn f^lipV Rpptlp 


fi 


9R 
Zo 


/""I'D YTAfC 


oyntnenc Cjrreen ClicJc Beetle 


7 


29 


GR6 


Synthetic Green Click Beetle 


8 


30 


GRverS.l 


Synthetic Green Click Beetle 


9 


31 


RDverl 


Synthetic Red Click Beetle 


10 


32 


RDver2 


Synthetic Red Click Beetle 


11 


33 


RDver3 


Synthetic Red Click Beetle 


12 


34 


RDver4 


Synthetic Red Click Beetle 


13 


218 


RDverS 


Synthetic Red Click Beetle 


14 


219 


RD7 


Synthetic Red Click Beetle 


15 


220 


RDverS. 1 


Synthetic Red Click Beetle 


16 


221 


RDverS .2 


Synthetic Red Click Beetle 


17 


222 


RD156-1H9 


Synthetic Red Click Beetle 


18 


223 


RELLUC 


Wild type Renilla 


19 


224 


Rlucverl 


Synthetic Renilla 


20 


225 


Rlucver2 


Synthetic Renilla 


21 


226 
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Rluc-final ' Synthetic Renilla 22 227 

Example 2 

5 Evolution of the RD luciferase gene 

RDver5.2 was mutated to increase its luminescence intensity, thereby creating 
RD156-1H9 which carries four additional amino acid changes (M2I, S349T, 
K488T, E538V) and three silent point mutations (SEQ ID NO: 18). 

a) Site-directed mutagenesis: 

10 The initial strategy was to use site-directed mutagenesis. There are four 
amino acid differences between the GR and RD synthetic genes with H348Q 
providing the greatest contribution to red color. Thus, this substitution may also 
cause structural changes in the protein that could lead to low light output. 
Optimization of positions near this area could increase light output. The 

1 5 following positions were selected for mutagenesis: 

1 . S344 (at the edge of the binding pocket for luciferin) - randomize this 
codon. 

2. A245 (strictly conserved but closest to 348 and at the edge of the active 
site pocket) — randomize this codon. 

20 3. 1347 (not conserved, next to 348 in sequence) - mutate to hydrophobic 

amino acids only. 

4. S349 (not conserved, next to 348 in sequence) - mutate to S, T, A, P 
only. 

Oligonucleotides designed to mutate the above positions were used in a 
25 site-directed mutagenesis experiment (W099/14336) and the resulting mutants 
were screened for luminescence intensity. There was little variation in light 
intensity and only about 25% were luminescent. For more detailed analysis, 
clones were picked and analyzed with the screening robot (PCT/W09914336). 
None of the clones had a luminescence intensity (LI) higher than RDver5.2, but 
30 four of the clones had slightly lower composite Km for luciferin and ATP (Km). 

b) Directed evolution: 
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Protocols and procedures used for the directed evolution are detailed in 
see PCT/W09914336. DNA from the four clones with lower Km was combined 
and three libraries of random mutants were produced. The libraries were 
screened with the robot and clones with the highest LI values were selected. 
5 These clones were shuffled together and another robotic screen was completed 
with an incubation temperature of 46°C. The three clones with the highest LI 
values were RD156-0B4, RD156-1A5, and RD156-1H9. 
c) Analysis: 

The three clones with the highest LI values were selected for manual analysis to 
10 confirm that their luminescence intensity was higher than that of RDver5.2 and 
to ensure that their spectral properties were not compromised. One of the clones 
was slightly green-shifted, all others maintained the spectral properties of 
RDver5.2 (Table 5). 





Table 5 




Clone 


Peak (nm) 


Width (nm) 


RD156-0B4 


616 


68 


RD156-1A5 


614 


70 


RD156-1H9 


618 


69 


Rdver5.2(prep#l) 


617 


70 


Rdver5.2 (prep #2) 


618 


69 



15 

The Km values for luciferin and the luminescence intensity relative to 
RDver5.2 were determined for all three clones in several independent 
experiments. All cells samples were processed with CCLR lysis buffer (E1483, 
Promega Corp., Madison, WI) and diluted 1: 10 into buffer (25 mM HEPES pH 
20 7.8,5% glycerol, 1 mg/ml BS A, 1 50 mM NaCl). Table 7 summarizes the results 
(Lum: luminescence values were normalized to optical density; measurements 
for independent experiments are separated by forward slashes) from expression 
in bacterial cells. RD156-1H9, the clone with the highest luminescence intensity 
(5 to 10-fold increase) also has an about 2-fold higher Km for luciferin. 

25 



Clone 



Table 6 
Km Luciferin [jiM] 



Lum (normalized to RDver5.2) 
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KD156-0B4 


8/10 


2.2/2.5 


RD156-1A5 


13/13 


3.1/5.6 


RD156-1H9 


20/23/23 


4/10.9/7.5 


RDver5.2 (prep #1) 


12/14/14 




RDver5.2 (prep #2) 


40/50 




GRver5.1 (prep#l) 


0.5 


64 


GRver5.1 (prep #2) 


3 





Table 7 shows a comparison between the luminescence intensities of 
RD156-1H9, GRverS.l and RDver5.2 normalized to GRverS.l with and without 
correction for the spectral sensitivity of the luminometer photomultiplier tube. 
5 With correction, the luminescence intensity of clone RD156-1H9 was only about 
2-fold lower than that of GRverS.l. The luciferin Km for clone RD156-1H9 is 
approximately 40-fold higher than GRverS.l. RD156-1H9 is thermostable at 
50°C for at least 2 hours. 

10 Table 7 



Name 


No Correction 


With Correction 


RDver5.2 


0.016 


0.06 


GRver5.1 


1.000 


1.00 


RD156-1H9 


0.116 


0.45 



Tables 8 and 9 show a comparison of luciferase expression levels in CHO 
1 5 cells. Table 8 shows the expression levels only from the control vectors in 
comparison to the firefly luciferase gene (RLU = relative light units). Table 9 
shows a comparison of the expression levels in all four pGL3 vectors calculated 
as a percent of the expression level in pGL3-control. 

20 

Table 8 

Synthetic Click Beetle Gene Expression 

Control vector rlu 
YG#81-6G01 177 
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Control vector rlu 

GRver5.1 343,417 

RDver5.1 7,161 

RD156-1H9 20,802 

FireFly 488,016 



Table 9 

Synthetic Click Beetle Gene Expression 



Vector 


Percent of control 




vector 


YG-control 


100 


RD-control 


100 


GR-control 


100 


RD156-1H9 control 


100 


YG-basic 


3.3 


RD-basic 


1.0 


GR-basic 


0.2 


RD156-1H9 basic 


0.3 


YG-promoter 


4.2 


RD-promoter 


15.1 


GR-promoter 


5.7 


RD156-1H9 promoter 


15.5 


YG-enhancer 


51.5 


RD-enhancer 


2.8 


GR-enhancer 


1.4 


RD156-1H9 enhancer 


0.3 



5 

Example 3 

Synthetic Renilla Luciferase Nucleic Acid Molecule 
The synthetic Renilla luciferase genes prepared include 1) an introduced 
10 Kozak sequence, 2) codon usage optimized for mammalian (human) expression, 
3) a reduction or elimination of unwanted restriction sites, 4) removal of 
prokaryotic regulatory sites (ribosome binding site and TATA box), 5) removal 
of splice sites and poly(A) addition sites, and 6) a reduction or elimination of 
mammalian transcriptional factor binding sequences. 
1 5 The process of computer-assisted design of synthetic Renilla luciferase 

genes by iterative rounds of codon optimization and removal of transcription 
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factor binding sites and other regulatory sites as well as restriction sites can be 
described in three steps: 

1 . Using the wild type Renilla luciferase gene as the parent gene, codon usage 
was optimized, one amino acid was changed (T-^A) to generate a Kozak 

5 consensus sequence, and undesired restriction sites were eliminated thereby 

creating synthetic gene Rlucverl . 

2. Remove prokaryotic regulatory sites, splice sites, poly(A) sites and 
transcription factor (TF) binding sites (first pass). Then remove newly 
created TF binding sites. Then remove newly created undesired restriction 

10 enzyme sites, prokaryotic regulatory sites, splice sites, and poly (A) sites 
without introducing new TF binding sites. This thereby created Rlucver2 . 

3. Change 3 bases of Rlucver2 thereby creating Rluc-finaL 

4. The actual gene was then constructed from synthetic oligonucleotides 
corresponding to the Rluc-final designed sequence. All mutations resulting 

1 5 from the assembly or PCR process were corrected. This gene is Rluc-final 

(SEQ ID NO:22) and encodes the amino acid sequence of SEQ ID NO:227. 

Codon Selection 

Starting with the Renilla reniformis luciferase sequence in Genbank 
20 (Accession No. M63501, SEQ ID NO: 19), codons were selected based on codon 
usage for optimal expression in human cells and to avoid E. coli low-usage 
codons. The best codon for expression in human cells (or the best two codons if 
found at a similar frequency) was chosen for all amino acids with more than one 
codon (Wada et al, 1990): 



25 


Arg: CGC 


Lys: AAG 




Leu: CTG 


Asn: AAC 




Ser: TCT/AGC 


Gin: CAG 




Thr: ACC 


His: CAC 




Pro: CCA/CCT 


Glu: GAG 


30 


Ala: GCC 


Asp: GAC 




Gly: GGC 


Tyr: TAC 




Val: GTG 


Cys: TGC 
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He: ATC/ATT Phe: TTC 
In cases where two codons were selected for one amino acid, they were 
used in an alternating fashion. To meet other criteria for the synthetic gene, the 
initial optimal codon selection was modified to some extent later. For example, 
5 introduction of a Kozak sequence required the use of GCT for Ala at amino acid 
position 2 (see below). 

The following low-usage codons in mammalian cells were not used 
unless needed: Arg: CGA,CGU;Leu: CTA,UUA;Ser: TCG;Pro: CCG; 
Val: GTA; and He: ATA. The following low-usage codons in E. coli were also 
10 avoided when reasonable (note that 3 of these match the low-usage list for 
mammalian cells): Arg: CGA/CGG/AGA/AGG, Leu: CTA; Pro: CCC; He: 
ATA. 

Introduction of Kozak Sequences 

The Kozak sequence: 5' a accATGG CT 3' (SEQ ID NO: 293) (the Nco I 
15 site is underlined, the coding region is shown in capital letters) was introduced to 
the synthetic Renilla luciferase gene. The introduction of the Kozak sequence 
changes the second amino acid from Thr to Ala (GCT). 
Removal of undesired restriction sites 

REBASE ver. 808 (updated August 1, 1998; Restriction Enzyme 
20 Database; 

www.neb.com/rebase) was employed to identify undesirable restriction sites as 
described in Example 1. The following undesired restriction sites (in addition to 
those described in Example 1) were removed according to the process described 
in Example 1 : EcolCR I, Ndel, Nsil, Sphl, Spel, Xmal, PstL 
25 The version of Renilla luciferase (Rluc) which incorporates all these 

changes is Rlucverl. 

Removal of prokarvotic (E. coli) regulatory sequences, splice sites, and poly(A) 
sites 

The priority and process for eliminating transcription regulation sites was 
30 as described in Example 1 . 

Removal of TF binding sites 
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The same process, tools, and criteria were used as described in Example 
1 Jiowever, the newer version 3.3 of the TRANSFAC database was employed. 

After removing prokaryotic regulatory sequences, splice sites and 
poly(A) sites from Rlucverl, the first search for TF binding sites identified about 
5 60 hits. All sites were eliminated with the exception of three that could not be 
removed without altering the amino acid sequence of the synthetic Renilla gene: 

1 . site at position 63 composed of two codons for W 
(TGGTGG), for CAC-binding protein T00076; 

2. site at position 522 composed of codons for KMV 
10 (AANATGGTN), for myc-DFl T00517; 

3. site at position 885 composed of codons for EMG 
(GARATGGGN), for myc-DFl T00517. 

The subsequent second search for (newly introduced) TF binding sites yielded 
about 20 hits. All new sites were eliminated, leaving only the three sites 

15 described above. Finally, any newly introduced restriction sites, prokaryotic 
regulatory sequences, splice sites and poly(A) sites were removed without 
introducing new TF binding sites if possible. 

Rlucver2 was obtained (SEQ ID Nos. 2 1 and 226). 

As in Example 1, lower stringency search parameters were specified for 

20 the TESS filtered string search to further evaluate the synthetic Renilla gene. 

With the LLH reduced from 10 to 9 and the minimum element length 
reduced from 5 to 4, the TESS filtered string search did not show any new hits. 
When, in addition to the parameter changes listed above, the organism 
classification was expanded from "mammalia" to "chordata", the search yielded 

25 only four more TF binding sites. When the Min LLH was further reduced to 
between 8 and 0, the search showed two additional 5-base sites (MAMAG and 
CTKTK) which combined had four matches in Rlucver2, as well as several 4- 
base sites. Also as in Example 1, Rlucver2 was checked for hits to entries in the 
EPD (Eukaryotic Promoter Database, Release 45). Three hits were determined 

30 (one to Mus musculus promoter H^L'd (Cell 44, 261 (1986), one to Herpes 
Simplex Virus type 1 promoter b'g^J kb, and one to Homo sapiens DHFR 
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promoter (J. Mol. Biol., 176, 169 (1984)). However, no further changes were 
made to Rlucver2. 

Summary of Properties for Rlucver2 
5 - All 30 low usage codons were eliminated. The introduction of a Kozak 
sequence changed the second amino acid from Thr to Ala; 
base composition: 55.7% GC {Renilla wild-type parent gene: 36.5%); 
one undesired restriction site could not be eliminated: EcoR V at position 
488; 

10 the synthetic gene had no prokaryotic promoter sequence but one 

potentially functional ribosome binding site (RBS) at positions 867-73 
(about 13 bases upstream of a Met codon ) could not be eliminated; 
all poly(A) addition sites were eliminated; 

splice sites: 2 donor splice sites could not be eliminated (both share the 
15 amino acid sequence MGK); 

TF sites: all sites with a consensus of >4 unambiguous bases were 
eliminated (about 280 TF binding sites were removed) with 3 exceptions 
due to the preference to avoid changes to the amino acid sequence. 
Synthetic Renilla luciferase sequences are shown in Figures 7 and 8. A codon 
20 usage comparison is shown in Figure 9. 

When introduced into pGL3, Rluc-final has a Kozak sequence 
(CACCATGGCT). The changes in Rluc-final relative to Rlucver2 were 
introduced during gene assembly. One change was at position 619, a C to an A, 
which eliminated a eukaryotic promoter sequence and reduced the stability of a 
25 hairpin structure in the corresponding oligonucleotide employed to assemble the 
gene. Other changes included a change from CGC to AGA at positions 218-220 
(resulted in a better oligonucleotide for PCR). 

Gene Assembly Strategy 
30 The gene assembly protocol employed for the synthetic Renilla luciferase 

was similar to that described in Example 1. The oligonucleotides employed are 
shown in Figure 10. 
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Sense Strand primer: 

5' AACCATGGCTTCCAAGGTGTACGACCCCGAGCAACGCAAA 3' (SEQ 
BDNO:236) 
5 Anti-sense Strand primer: 

5' GCTCTAGAATTACTGCTCGTTCTTCAGCACGCGCTCCACG 3' (SEQ 
ID NO:237) 

The resulting synthetic gene fragment was cloned into a pRAM vector 
using Nco I and Xba I. Two clones having the correct size insert were 
10 sequenced. Four to six mutations were found in the synthetic gene from each 
clone. These mutations were fixed by site-directed mutagenesis (Gene Editor 
from Promega Corp., Madison, WI) and swapping the correct regions between 
these two genes. The corrected gene was confirmed by sequencing. 

15 Other Vectors 

To prepare an expression vector for the synthetic Renilla luciferase gene 

in a pGL-3 control vector backbone, 5 jxg of pGL3-control was digested with 

Nco I and Xba I in 50 pi final volume with 2 (il of each enzyme and 5 pi 1 OX 

buffer B (nanopure water was used to fill the volume to 50 \xl). The digestion 
20 reaction was incubated at 37°C for 2 hours, and the whole mixture was run on a 

1% agarose gel in 1XTAE. The desired vector backbone fragment was purified 

using Qiagen's QIAquick gel extraction kit. 

The native Renilla luciferase gene fragment was cloned into pGL3- 

control vector using two oligonucleotides, Nco I-RL-F and Xba I-RL-R, to PCR 
25 amplify native Renilla luciferase gene using pRL-CMV as the template. The 

sequence for Nco I-RL-F is 5'- 

CGCTAGCCATGGCTTCGAAAGTTTATGATCC -3' (SEQ ID NO:238); the 
sequence for,¥&a I-RL-R is 

5' GGCC AGT AACTCT AGAATTATTGTT-3 ' (SEQ ID NO:239). The PCR 
30 reaction was carried out as follows: 
Reaction mixture (for 100 jil): 

DNA template (Plasmid) 1.0 (al (1.0 ng/\xl final) 
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lOXRec. Buffer 



10.0 jal (Stratagene Corp.) 



dNTPs (25 mM each) 



1.0 |il (final 250 pM) 



5 



Primer 1 (10 mM) 



2.0 \il (0.2 mM final) 



Primer 2 (10 pM) 



2.0 \i\ (0.2 final) 



10 



Pfu DNA Polymerase 



2.0 \x\ (2.5 U/fil, Stratagene Corp.) 



82.0 jil double distilled water 



PCR Reaction: heat 94°C for 2 minutes; (94°C for 20 seconds; 

65°C for 1 minute; 72°C for 2 minutes; then 72°C for 5 minutes) x 25 cycles, 

15 then incubate on ice. The PCR amplified fragment was cut from a gel, and the 
DNA purified and stored at -20°C. 

To introduce native Renilla luciferase gene fragment into pGL3-control 
vector, 5 |ig of the PCR product of the native Renilla luciferase gene (RAM-RL- 
synthetic) was digested with Nco I and Xba I. The desired Renilla luciferase 

20 gene fragment was purified and stored at -20°C. 

Then 100 ng of insert and 100 ng of pGL3-control vector backbone were 
digested with restriction enzymes Nco I and Xba I and ligated together. Then 2 
\x\ of the ligation mixture was transformed into JM109 competent cells. Eight 
ampicillin resistance clones were picked and their DNA isolated. DNA from 

25 each positive clone of pGL3 -control-native and pGL3-control-synthetic was 

purified. The correct sequences for the native gene and the synthetic gene in the 
vectors were confirmed by DNA sequencing. 

To determine whether the synthetic Renilla luciferase gene has improved 
expression in mammalian cells, the gene was cloned into the mammalian 

30 expression vector pGL3-control vector under the control of SV40 promoter and 
SV40 early enhancer (Fig. 13 A). The native Renilla luciferase gene was also 
cloned into the pGL-3 control vector so that the expression from synthetic gene 
and the native gene could be compared. The expression vectors were then 
transfected into four common mammalian cell lines (CHO, NIH3T3, Hela and 

35 CV-1 ; Table 10), and the expression levels compared between the vectors with 
the synthetic gene versus the native gene. The amount of DNA used was at two 
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different levels to ascertain that expression from the synthetic gene is 
consistently increased at different expression levels. The results show a 70-600 
fold increase of expression for the synthetic Renilla luciferase gene in these cells 
(Table 10). 

Table 10 

Enhanced Synthetic Renilla Gene Expression 

Cell Type Amount Vector Fold Expression Increase 

CHO 0.2 |ig 142 

2.8 jig 145 

NIH3T3 0.2 jig 326 

2.0 ^ig 593 

HeLa 0.2 \ig 185 

LOjig 103 

CV-1 0.2 jig 68 

2.0 ^g 72 



10 One important advantage of luciferase reporter is its short protein half- 

life. The enhanced expression could also result from extended protein half-life 
and, if so, this gives an undesired disadvantage of the new gene. This possibility 
is ruled out by a cycloheximide chase ("CHX Chase") experiment (Figure 14), 
which demonstrated that there was no increase of protein half-life resulted from 

15 the humanized Renilla luciferase gene. 

To ensure that the increase in expression is not limited to one expression 
vector backbone, is promoter specific and/or cell specific, a synthetic Renilla 
gene (Rluc-final) as well as native Renilla gene were cloned into different vector 
backbones and under different promoters (Figure 13B). The synthetic gene 

20 always exhibited increased expression compared to its wild-type counterpart 
(Table 11). 



25 
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Table 11 

Renilla Gene Expression: native v. synthetic (Rluc-finaD 



Vector 


MH-3T3 


HeLa 


pun 


pRL-tk, native 


3,834.6 


922.4 


7,671.9 


pRL-tk, synthetic 


13,252.5 


9,040.2 


41,743.5 


pRL-CMV, native 


168,062.2 


842,482.5 


153,539.5 


pRL-CMV, synthetic 


2,168,129 


8,440,306 


2,532,576 


pRL-SV40, native 


224,224.4 


346,787.6 


85,323.6 


pRL-SV40, synthetic 


1,469,588 


2,632,510 


1,422,830 


pRL-null, native 




A 1 1 1 

431. / 


2,434 


pRL-null, synthetic 


9,151.17 


.2,439 


28,317.1 


pRGL3b, native 


12 


21.8 


17 


pRGL3b, synthetic 


130.5 


212.4 


1,094.5 


pRGL3-tk, native 


27.9 


155.5 


186.4 


pRGL3-tk, synthetic 


6,778.2 


8,782.5 


9,685.9 


pRL-tk no intron, native 


31.8 


165 


93.4 


pRL-tk no intron, synthetic 


6,665.5 


6,379 


21,433.1 



Table 12 

5 Renilla Luciferase Expression in Mammalian Cells 

Percent of control vector 



Vector CHQ cells NIH3T3 cells HeLa cells 

pRL-control native 100 100 100 

pRL-control synthetic 100 100 100 

pRL-basic native 4.1 5.6 0.2 

pRL-basic synthetic 0.4 0.1 0.0 

pRL-promoter native 5.9 7.8 0.6 

pRL-promoter synthetic 15.0 9.9 1.1 
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Percent of control vector 



pRL-enhancer native 



42.1 



123.9 



52.7 



pRL-enhancer synthetic 



2.6 



1.5 



5.4 



10 



15 



20 



(Vector backbones illustrated in Figure 13 A) 

With reduced spurious expression the synthetic gene should exhibit less 
basal level transcription in a promoterless vector. The synthetic and native 
Renilla luciferase genes were cloned into the pGL3 -basic vector to compare the 
basal level of transcription. Because the synthetic gene itself has increased 
expression efficiency, the activity from the promoterless vector cannot be 
compared directly to judge the difference in basal transcription, rather, this is 
taken into consideration by comparing the percentage of activity from the 
promoterless vector in reference to the control vector (expression from the basic 
vector divided by the expression in the fully functional expression vector with 
both promoter and enhancer elements). The data demonstrate that the synthetic 
Renilla luciferase has a lower level of basal transcription than the native gene 
(Table 12) 

It is well known to those skilled in the art that an enhancer can 
substantially stimulate promoter activity. To test whether the synthetic gene has 
reduced risk of inappropriate transcriptional characteristics, the native and 
synthetic gene were introduced into a vector with an enhancer element (pGL3- 
enhancer vector). Because the synthetic gene has higher expression efficiency, 
the activity of both cannot be compared directly to compare the level of 
transcription in the presence of the enhancer, however, this is taken into account 
by using the percentage of activity from enhancer vector in reference to the 
control vector (expression in the presence of enhancer divided by the expression 
in the fully functional expression vector with both promoter and enhancer 
elements). Such results show that when native gene is present, the enhancer 
alone is able to stimulate transcription from 42-124% of the control, however, 
when the native gene is replaced by the synthetic gene in the same vector, the 
activity only constitutes 1-5% of the value when the same enhancer and a strong 
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S V40 promoter are employed. This clearly demonstrates that synthetic gene has 
reduced risk of spurious expression (Table 12). 

The synthetic Renilla gene (Rluc-final) was used in in vitro systems to 
compare translation efficiency with the native gene. In a T7 quick coupled 
5 transcription/translation system (Promega Corp., Madison, WI), pRL-null native 
plasmid (having the native Renilla luciferase gene under the control of the 17 
promoter) or the same amount of pRL-null-synthetic plasmid (having the 
synthetic Renilla luciferase gene under the control of the T7 promoter) was 
added to the TNT reaction mixture and luciferase activity measured every 

10 5 minutes up to 60 minutes. Dual Luciferase assay kit (Promega Corp.) was 
used to measure Renilla luciferase activity. The data showed that improved 
expression was obtained from the synthetic gene (Figure 15AJ3)- To further 
evidence the increased translation efficiency of the synthetic gene, RNA was 
prepared by an in vitro transcription system, then purified. pRL-null (native or 

1 5 synthetic) vectors were linearized with BamH I. The DNA was purified by 

multiple phenol-chloroform extraction followed by ethanol precipitation. An in 
vitro T7 transcription system was employed by prepare RNAs. The DNA 
template was removed by using RNase-free DNase, and RNA was purified by 
phenol-chloroform extraction followed by multiple isopropanol precipitations. 

20 The same amount of purified RNA, either for the synthetic gene or the native 
gene, was then added to a rabbit reticulocyte lysate (Figure 15 C, D) or wheat 
germ lysate (Figure 15 E, F). Again, the synthetic Renilla luciferase gene RNA 
produced more luciferase than the native one. These data suggest that the 
translation efficiency is improved by the synthetic sequence. To determine why 

25 the synthetic gene was highly expressed in wheat germ, plant codon usage was 
determined. The lowest usage codons in higher plants coincided with those in 
mammals. 

Reporter gene assays are widely used to study transcriptional regulation 
events. This is often carried out in co-transfection experiments, in which, along 
30 with the primary reporter construct containing the testing promoter, a second 
control reporter under a constitutive promoter is transfected into cells as an 
internal control to normalize experimental variations including transfection 
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efficiencies between the samples. Control reporter signal, potential promoter 
cross talk between the control reporter and primary reporter, as well as potential 
regulation of the control reporter by experimental conditions, are important 
aspects to consider for selecting a reliable co-reporter vector. 
5 As described above, vector constructs were made by cloning synthetic 

Renilla luciferase gene into different vector backbones under different 
promoters. All the constructs showed higher expression in the three mammalian 
cell lines tested (Table 1 1). Thus, with better expression efficiency, the synthetic 
Renilla luciferase gives out higher signal when transfected into mammalian cells. 

10 Because a higher signal is obtained, less promoter activity is required to 

achieve the same reporter signal, this reduced risk of promoter interference. 
CHO cells were transfected with 50 ng pGL3-control (firefly luc+) plus one of 5 
different amounts of native pRL-TK plasmid (50, 100, 500, 1000, or 2000 ng) or 
synthetic pRL-TK (5, 10, 50, 100, or 200 ng). To each transfection, pUC19 

15 carrier DNA was added to a total of 3 jig DNA. Shown in Figure 16 is the 

experiment demonstrating that 10 fold less pRL-TK DNA gives similar or more 
signal as the native gene, with reduced risk of inhibiting expression from the 
primary reporter pGL3-control. 

Experimental treatment sometimes may activate cryptic sites within the 

20 gene and cause induction or suppression of the co-reporter expression, which 
would compromise its function as co-reporter for normalization of transfection 
efficiencies. One example is that TPA induces expression of co-reporter vectors 
harboring the wild-type gene when transfecting MCF-7 cells. 500 ng pRL-TK 
(native), 5 jag native and synthetic pRG-B, 2.5 jag native and synthetic pRG-TK 

25 were transfected per well of MCF-7 cells. 100 ng/well pGL3-control (firefly 
luc+) was co-transfected with all RL plasmids. Carrier DNA, pUC19, was used 
to bring the total DNA transfected to 5.1 fig/well. 15.3 jal TransFast Transfection 
Reagent (Promega Corp., Madison, WI) was added per well. Sixteen hours later, 
cells were trypsinized, pooled and split into six wells of a 6-well dish and 

30 allowed to attach to the well for 8 hours. Three wells were then treated with the 
0.2 nM of the tumor promoter, TPA (phorbol-12-myristate-13-acetate, 
Calbiochem #524400-S), and three wells were mock treated with 20 \i\ DMSO. 
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Cells were harvested with 0.4 ml Passive Lysis Buffer 24 hours post TPA 
addition. The results showed that by using the synthetic gene, undesirable 
change of co-reporter expression by experimental stimuli can be avoided (Table 
1 3). This demonstrates that using synthetic gene can reduce the risk of 
5 anomalous expression. 

Table 13 
TPA Induction 



Vector 


Rlu 


Fold Induction 


pRL-tk untreated (native) 


184 




pRL-tk TPA treated (native) 


812 


4.4 


pRG-B untreated (native) 


1 




pRG-B TPA treated (native) 


8 


8.0 


pRG-B untreated (final) 


132 




pRG-B TPA treated (final) 


195 


1.47 


pRG-tk untreated (native) 


44 




pRG-tk TPA treated (native) 


192 


4.36 


pRG-tk untreated (final) 


12,816 




pRG-tk TPA treated (final) 


11,347 


0.88 
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described in relation to certain preferred embodiments thereof, and many details 
have been set forth for purposes of illustration, it will be apparent to those skilled 
in the art that the invention is susceptible to additional embodiments and that 
certain of the details herein may be varied considerably without departing from 
5 the basic principles of the invention. 
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WHAT IS CLAIMED IS: 



1 . A synthetic nucleic acid molecule comprising at least 300 nucleotides of 
a coding region for a polypeptide, having a codon composition differing 
at more than 25% of the codons from a wild type nucleic acid sequence 
encoding a polypeptide, and having at least 3-fold fewer transcription 
regulatory sequences relative to the average number of such sequences 
resulting from random selections of codons at the codons which differ, 
wherein the transcription regulatory sequences are selected from the 
group consisting of transcription factor binding sequences, intron splice 
sites, poly(A) addition sites and promoter sequences, and wherein the 
polypeptide encoded by the synthetic nucleic acid molecule has at least 
85% sequence identity to the polypeptide encoded by the wild type 
nucleic acid sequence. 

2. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule has at least 5 -fold fewer transcription regulatory 
sequences. 

3. The synthetic nucleic acid molecule of claim 1 wherein the codon 
composition of the synthetic nucleic acid molecule differs from the wild 
type nucleic acid sequence at more than 35% of the codons. 

4. The synthetic nucleic acid molecule of claim 1 wherein the codon 
composition of the synthetic nucleic acid molecule differs from the wild 
type nucleic acid sequence at more than 45% of the codons. 

5. The synthetic nucleic acid molecule of claim 1 wherein the codon 
composition of the synthetic nucleic acid molecule differs from the wild 
type nucleic acid sequence at more than 55% of the codons. 
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6. The synthetic nucleic acid molecule of claim 1 wherein the majority of 
codons which differ are ones that are preferred codons of a desired host 
cell. 

7. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule encodes a reporter molecule. 

8. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule encodes a selectable marker protein. 

9. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule encodes a luciferase. 

10. The synthetic nucleic acid molecule of claim 9 wherein the wild type 
nucleic acid sequence encodes a Renilla luciferase. 

1 1 . The synthetic nucleic acid molecule of claim 9 wherein the wild type 
nucleic acid sequence encodes a beetle luciferase. 

12. The synthetic nucleic acid molecule of claim 1 1 wherein the synthetic 
nucleic acid molecule encodes the amino acid valine at position 224. 

13. The synthetic nucleic acid molecule of claim 1 1 wherein the synthetic 
nucleic acid molecule encodes the amino acid histidine at position 224, 
histidine at position 247, isoleucine at position 346, glutamine at position 
348, or any combination thereof. 



14. 



The synthetic nucleic acid molecule of claim 1 wherein the majority of 
codons which differ in the synthetic nucleic acid molecule are those 
which are employed more frequently in mammals. 
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15. The synthetic nucleic acid molecule of claim 1 wherein the majority of 
codons which differ in the synthetic nucleic acid molecule are those 
which are preferred codons in humans. 

16. The synthetic nucleic acid molecule of claim 1 wherein the majority of 
codons which differ in the synthetic nucleic acid molecule are those 
which are preferred codons in plants. 

17. The synthetic nucleic acid molecule of claim 9 wherein the synthetic 
nucleic acid molecule comprises SEQ ID NO:21 (Rlucver2) or SEQ ID 
NO:22 (Rluc-final). 

18. The synthetic nucleic acid molecule of claim 9 wherein the synthetic 
nucleic acid molecule comprises SEQ ID NO:7 (GRverS), SEQ ID NO:8 
(GRver6), SEQ ID NO:9 (GRver5.1), or SEQ ID NO:297 (GRverS.l). 

19. The synthetic nucleic acid molecule of claim 9 wherein the synthetic 
nucleic acid molecule comprises SEQ ID NO: 14 (RDverS), SEQ ID 
NO: 15 (RDver7), SEQ ID NO: 16 (RDverS.l), SEQ ID NO:299 
(RDverS.l), SEQ ID NO:17 (RDver5.2), SEQ ID NO:18 (RD156-1H9) 
orSEQIDNO:301 (RD156-1H9). 

20. The synthetic nucleic acid molecule of claim 1 5 wherein the majority of 
codons which differ are the human codons CGC, CTG, TCT, AGC, 
ACC, CCA, CCT, GCC, GGC, GTG, ATC, ATT, AAG, AAC, CAG, 
CAC, GAG, GAC, TAC, TGC and TTC. 

21. The synthetic nucleic acid molecule of claim 15 wherein the majority of 
codons which differ are the human codons CGC, CTG, TCT, ACC, 
CCA, GCC, GGC, GTC, and ATC or codons CGT, TTG, AGC, ACT, 
CCT, GCT, GGT, GTG and ATT. 



WO 02/16944 



82 



PCT/US01/26566 



22. The synthetic nucleic acid molecule of claim 16 wherein the majority of 
codons which differ are the plant codons CGC, CTT, TCT, TCC, ACQ 
CCA, CCT, GCT, GGA, GTG, ATC, ATT, AAG, AAC, CAA, CAC, 
GAG, GAC, TAC, TGC and TTC. 

23 . The synthetic nucleic acid molecule of claim 1 6 wherein the majority of 
codons which differ are the plant codons CGC, CTT, TCT, ACC, CCA, 
GTC, GGA, GTC, and ATC or codons CGT, TGG, AGC, ACT, CCT, 
GCC, GGT, GTG and ATT. 

24. The synthetic nucleic acid molecule .of claim 1 wherein the synthetic 
nucleic acid molecule is expressed in a mammalian host cell at a level 
which is greater than that of the wild type nucleic acid sequence. 

25. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule has an increased number of CTG or TTG leucine- 
encoding codons. 

26. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule has an increased number of GTG or GTC valine- 
encoding codons. 

27. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule has an increased number of GGC or GGT glycine- 
encoding codons. 

28. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule an increased number of ATC or ATT isoleucine- 
encoding codons. 
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29. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 

nucleic acid molecule has an increased number of CCA or CCT proline- 
encoding codons. 



30. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 

nucleic acid molecule has an increased number of CGC or CGT arginine- 
encoding codons. 



3 1 . The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule has an increased number of AGC or TCT seiine- 
encoding codons. 



32. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule has an increased number of ACC or ACT 
threonine-encoding codons. 



33. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 

nucleic acid molecule has an increased number of GCC or GCT alanine- 
encoding codons. 



34. The synthetic nucleic acid molecule of claim 1 wherein the codons in the 
synthetic nucleic acid molecule which differ encode the same amino 
acids as the corresponding codons in the wild type nucleic acid sequence. 

35. A plasmid comprising the synthetic nucleic acid molecule of claim 1 . 



36. An expression vector comprising the synthetic nucleic acid molecule of 
claim 1 linked to a promoter functional in a cell. 



37. The expression vector of claim 36 wherein the synthetic nucleic acid 
molecule is operatively linked to a Kozak consensus sequence. 
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38. The expression vector of claim 36 wherein the promoter is functional in a 
mammalian cell. 

39. The expression vector of claim 36 wherein the promoter is functional in a 
human cell. 

40. The expression vector of claim 36 wherein the promoter is functional in a 
plant cell. 

41 . The expression vector of claim 36 wherein the expression vector further 
comprises a multiple cloning site. 

42. The expression vector of claim 41 wherein the expression vector 
comprises a multiple cloning site positioned between the promoter and 
the synthetic nucleic acid molecule. 

43. The expression vector of claim 41 wherein the expression vector 
comprises a multiple cloning site positioned downstream from the 
synthetic nucleic acid molecule. 

44. A host cell comprising the expression vector of claim 36. 

45. A reporter gene expression kit comprising, in suitable container means, 
the expression vector of claim 36. 

46. An isolated polypeptide encoded by SEQ ID NO:9 (GRverS.l) or SEQ 
IDN0:18(RD156-1H9). 

47 . A polynucleotide which hybridizes under stringent hybridization 
conditions to SEQ ID NO:22 (Rluc-fmal), SEQ ID NO:9 (GRverS.l), 
SEQ ID NO:18 (RD156-1H9), SEQ ID NO:297 (GRverS.l), SEQ ID 
NO:301 (RD156-1H9), or the complement thereof. 
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48. A method to prepare a synthetic nucleic acid molecule comprising an 
open reading frame, comprising: 

a) altering a plurality of transcription regulatory sequences in a parent 
nucleic acid sequence which encodes a polypeptide having at least 100 
amino acids to yield a synthetic nucleic acid molecule which has at least 
3 -fold fewer transcription regulatory sequences relative to the parent 
nucleic acid sequence, wherein the transcription regulatory sequences are 
selected from the group consisting of transcription factor binding 
sequences, intron splice sites, poly(A) addition sites, enhancer sequences 
and promoter sequences; and 

b) altering greater than 25% of the codons in the synthetic nucleic acid 
sequence which has a decreased number of transcription regulatory 
sequences to yield a further synthetic nucleic acid molecule, wherein the 
codons which are altered do not result in an increased number of 
transcription regulatory sequences, wherein the further synthetic nucleic 
acid molecule encodes a polypeptide with at least 85% amino acid 
sequence identity to the polypeptide encoded by the parent nucleic acid 
sequence. 

49. A method to prepare a synthetic nucleic acid molecule comprising an 
open reading frame, comprising: 

a) altering greater than 25% of the codons in a parent nucleic acid 
sequence which encodes a polypeptide having at least 100 amino acids to 
yield a codon-altered synthetic nucleic acid molecule, and 

b) altering a plurality of transcription regulatory sequences in the codon- 
altered synthetic nucleic acid molecule to yield a further synthetic nucleic 
acid molecule which has at least 3 -fold fewer transcription regulatory 
sequences relative to a synthetic nucleic acid molecule with a random 
selection of codons at the codons which differ, wherein the transcription 
regulatory sequences are selected from the group consisting of 
transcription factor binding sequences, intron splice sites, poly(A) 
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addition sites, enhancer sequences and promoter sequences, and wherein 
the further synthetic nucleic acid molecule encodes a polypeptide with at 
least 85% amino acid sequence identity to the polypeptide encoded by 
the parent nucleic acid sequence. 

50. The method of claim 48 or 49 wherein the parent nucleic acid sequence 
encodes a reporter molecule. 

5 1 . The method of claim 48 or 49 wherein the parent nucleic acid sequence 
encodes a luciferase. 



52. The method of claim 48 or 49 wherein the synthetic nucleic acid 
molecule hybridizes under medium stringency hybridization conditions 
to the parent nucleic acid sequence. 

53. The method of claim 48 or 49 wherein the codons which are altered 
encode the same amino acid as the corresponding codons in the parent 
nucleic acid sequence. 

54. A synthetic nucleic acid molecule which is the further synthetic nucleic 
acid molecule prepared by the method of claim 48 or 49. 

55. A method for preparing at least two synthetic nucleic acid molecules 
which are codon distinct versions of a parent nucleic acid sequence which 
encodes a polypeptide, comprising: 

a) altering a parent nucleic acid sequence to yield a synthetic nucleic 
acid molecule having an increased number of a first plurality of codons 
that are employed more frequently in a selected host cell relative to the 
number of those codons in the parent nucleic acid sequence; and 

b) altering the parent nucleic acid sequence to yield a further synthetic 
nucleic acid molecule having an increased number of a second plurality 
of codons that are employed more frequently in the host cell relative to 
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the number of those codons in the parent nucleic acid sequence, wherein 
the first plurality of codons is different than the second plurality of 
codons, and wherein the synthetic and the further synthetic nucleic acid 
molecules encode the same polypeptide. 

56. The method of claim 55 further comprising altering a plurality of 
transcription regulatory sequences in the synthetic nucleic acid molecule, 
the further synthetic nucleic acid molecule, or both, to yield at least one 
yet further synthetic nucleic acid molecule which has at least 3-fold fewer 
transcription regulatory sequences relative to the synthetic nucleic acid 
molecule, the further synthetic nucleic acid molecule, or both. 

57. The method of claim 55 further comprising altering at least one codon in 
the first synthetic sequence to yield a first modified synthetic sequence 
which encodes a polypeptide with at least one amino acid substitution 
relative to the polypeptide encoded by the first synthetic nucleic acid 
sequence. 

58. The method of claim 56 further comprising altering at least one codon in 
the second synthetic sequence to yield a second modified synthetic 
sequence which encodes a polypeptide with at least one amino acid 
substitution relative to the polypeptide encoded by the first synthetic 
nucleic acid sequence. 

59. The method of claim 55 wherein the synthetic sequences encode a 
luciferase. 

60. The synthetic nucleic acid molecule of claim 1 wherein the synthetic 
nucleic acid molecule is expressed at a level which is at least 110% of 
that of the wild type nucleic acid sequence in a cell or cell extract under 
identical conditions. 
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61. The synthetic nucleic acid molecule of claim 1 wherein the polypeptide 
encoded by the synthetic nucleic acid molecule has at least 90% 
contiguous sequence identity to the polypeptide encoded by the wild type 
nucleic acid sequence. 

62. The synthetic nucleic acid molecule of claim 1 wherein the polypeptide 
encoded by the synthetic nucleic acid molecule is identical in amino acid 
sequence to the polypeptide encoded by the wild type nucleic acid 
sequence. 

63. A vector comprising a synthetic nucleic acid molecule having at least 3- 
fold fewer transcriptional regulatory sequences relative to a vector 
comprising a parent nucleic acid sequence, wherein the transcription 
regulatory sequences are selected from the group consisting of 
transcription factor binding sequences, intron splice sites, poly(A) 
addition sites and promoter sequences. 

64. The vector of claim 63 wherein the synthetic nucleic acid molecule does 
not encode a polypeptide. 

65. The method of claim 48 or 49 further comprising altering the further 
synthetic nucleic acid molecule to encode a polypeptide having at least 
one amino acid substitution relative to the polypeptide encoded by the 
parent nucleic acid sequence. 

66. The method of claim 48 or 49 wherein the altering of transcription 
regulatory sequences does not introduce amino acid substitutions to the 
polypeptide encoded by the synthetic nucleic acid molecule. 
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Figure 2 (c< " t) 
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Figure 2 (cont) 
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Figure 2 (cont.) 



GRVER51.SEQ 


T 


T 


G 


c 


c 


T 


A A A G 


G 


T 


G 


T 


C 


A 


T 


G 


c 


A 


G 


A 


C 


T 


C A C 


c 


A 


G 


A 


A T 


A 


T 


c 


T 


G T G 


640 


GR6 . SEQ 


T 


T 


G 


c 


c 


T 


A A A G 


G 


T 


G 


T 


C 


A 


T 


G 


c 


A 


G 


A 


c 


T 


c 


A 


C 


c 


A 


G 


A 


ATA 


T 


c 


T 


G 


T 


G 


640 


GRVER5.SEQ 


T? T 


G 


c 


c 


T 


A A A G 


G T 


G 


T 


c 


A T 


G 


C A 


G 


A 


C T 


C A C 


C A 


G 


A 


ATA 


T 


c 


T 


G T 


G 


640 


GRVER4 . SEQ 


T 


T 


G 


c 


c 


T 


A A A G 


G T G T 


c 


A T 


G 


C A 


G 


A 


C T C A C 


C 


A 


G 


A 


A T 


A 


T 


c 


T 


G 


T 


G 


640 


GRVER3 . SEQ 


T 


T 


G 


c 


c 


T 


A A A G 


G T 


G 


T 


c 


A T 


G 


C 


A 


G 


A 




rp 
1 


c 


A 


C 


C 


A 


G 


A 


A T 


TV 

rl 


T 


c 


T 


G 


T 


G 


640 


GRVER2.SEQ 


C 


T 


G 


c 


c 


T 


A A A G 


Is 




G 


T 


G 


A T 


G 


C A 


G 


A 


c 


T 


C A C 


C A A 


A 


A T 


TV 


T 


c 


T 


G T G 


640 


GRVERl - SEQ 


C 


T 


G 


c 


c 


T 


A A A G 


G 




G 


T 


G 


A T 


G 


C A 


G 


A 


L. 


rp 


C A C C A A 


A 


A T 


A 


T 


c 


T 


G T 


G 


640 


YGB1-6G1 . SEQ T T A C 


c 


G 


A A A G 


G 


T 


G 


T 


A A 


T 


G 


C 


A 


A A 


c 


T 


C 


A C 


C 


A A 


A 


A T 


A 


T 


T 


T 


G T 


G 


640 


RDVERl . SEQ 


T 


T 


G 


c 


c 


A 


A A 


G 


G 


G 


T 


G 


T 


C 


A 


T 


G 


C A A A 


c 


c 


C A 


T 


C A 


G 


A 


A 


c 


A 


T 


T 


T 


G 


C 


G 


640 


RDVER2 . SEQ 


T 


T 


G 


c 


c 


A 


A A 


G 


G 


G 


T 


G 


T 


C 


A T 


G 


C A A A 


c 


C 


C 


A 


T 


C A 


Gj 


A 


A 


c 


A 


T 


T 


T 


G 


C 


G 


640 


RDVER3 . SEQ 


C 


T 


C 


c 


Q 


A 


A A 


G 


G 


G 


C 


G 


T 


c 


A 


T 


G 


C 


A 


G 


A 


c 


C 


C 


A 


T 


C 


A A 


A 


A 


c 


A 


T 


T 


T 


G 


C 


G 


640 


RDVER4 . SEQ 


C 


T 


C 


c 


c 


A 


A A 


G 


G 


G 


A 


G T 


c 


A 


T 


G 


C A 


G 


A 


c 


C 


C 


A 


T 


C A A 


A 


A 


c 


A 


T 


T 


T 


G 


C 


G 


640 


RDVER5.SEQ 


C 


T 


C 


c 


Q 


A 


A A 


G 


G 


G 


A 


G T 


c 


A 


T 


G 


C A 


G 


A 


c 


C 


C 


A 


T 


C A A 


A 


A 


c 


A 


T 


T 


T 


G 


C 


G 


640 


RD7.SEQ 


c 


T 


c 


c 


c 


A 


A A 


G 


G 


G 


A 


G 


T 


c 


A T 


G 


C A 


G 


A 


c 


C 


C 


A 


T 


C A A 


A 


A 


c 


A 


1 


T 


T 


G 


c 


G 


640 


RDVER51 . SEQ 


c 


T 


c 


c 


c 


A 


A A 


G 


G 


G 


A 


G T 


c 


A T 




C 


A 


G 


A 


c 


C 


C 


A 


T 


C 


A A 


A 


A 


c 


A 


T 


T 


T 


G 


c 


G 




RDVER52 . SEQ 


c 


T 


c 


c 


c 


A 


A A 


G 


G 


G 


A 


G 


T 


c 


A T 


Q 


C 


A 


G 


A 


c 


C 


C 


A 


T 


C A A 


A 


A 


c 


A 


T 


T 


T 


G 


c 


G 


640 


RD1561H9.SEQ 


c 


T 


c 


c 


c 


A 


A A 


G 


G 


G 


A 


G 


T 


c 


A 


T 


Q 


C 


A 


G 


A 


c 


C 


C A 


T 


C 


A A 


A 


A 


c 


A 


T 


T 


T 


G 


c 


G 


640 


GRVER51 . SEQ 


T 


G 


c 


G 


T 


T 


T 


G 


A 


T 


C 


C 


A 


C 


G 


C T 




T 


C 


G 


A 


c 


C 


c 


T 


C 


G 


T 


G 


T 


G 


G 


G 


T 


A 


c 


T 


c 


A 


fiftfi 

DOU 


GR6.SEQ 


T 


G 


c 


G 


T 


T 


T 


G 


A 


T 


C 


C 


A 


C 


G C T 




T 


C 


G 


A 


c 


C 


c 


T 


C 


G 


T 


G 


T 


G 


G 


G 


T 


A 




T 


c 


A 


680 


GRVER5 . SEQ 


T 


G 


c 


G 


T 


T 


T 


G 


A 


T 


C 


c 


A 


C 


G 


C T 




T 


c 


G 


A 


c 


c 


c 


T 


c 


G 


T 


G 


T 


G 


G 


G 


T 


A 


Q 


T 


c 


A 


680 


GRVER4 . SEQ 


T 


G 


c 


G 


T 


T 


T 


G 


A 


T 


C 


c 


A 


C 


G C T 




T 


c 


G 


A 


c 


c 


c 


T 


c 


G 


T 


G 


T 


G 


G 


G 


T 


A 


Q 


T 


c 


A 


DOU 


GRVER3 . SEQ 


T 


G 


c 


G 


C 


T 


T 


G 


A 


T 


C 


c 


A 


C 


G 


C 


C 




T 


c 


G 


A 


c c 


c 


T 


c 


G 


T 


G 


T 


G 


G 


G 


T 


A 


Q 


T 


c 


A 


680 


GRVER2 . SEQ 


T 


C 


c 


G 


C 


T 


T 


G 


A 


T 


T 


c 


A 


T 


G 


C 


c 


c 


T 


G 


G 


A 


c 


c 


c 


A 


c 


G 


T 


G 


rp 
1 


G 


G 


G 


T 


A 


r» 


T 


c 


A 


con 

DOU 


GRVERl. SEQ 


T 


C 


c 


G 


C 


"T 


T 


G 


A 


T 


T 


c 


A T 


G 


c 


c 


c 


T 


G 


G 


A 


c 


c 


c 


A C 


G 


T 


G 


T 


G 


G 


G 


T 


A 


c 


C 


c 


A 


680 


YG81-6G1.SEQT CCGACTTA 


T 


A C 


A 


T 


G 


c 


T 


T 


TAG 


A 


c 


c 


c 


C 


A 


G 


G G 


C 


A G 


G 


A A 


c 


G 


c 


A 


680 


RDVERl . SEQ 


T 


G 


c 


G 


T 


C 


T 


G 


A 


T 


C 


c 


A 


C 


G 


c 


T 


C 


T 


c 


G 


A 


T 


c 


c 


T 


C 


G 


C 


T 


A 


C 


G 


G 


C 


A 


c 


T 


c 


A 


680 


RDVER2 . SEQ 


T 


G 


c 


G 


T 


C 


T 


G 


A 


T 


C 


c 


a[c 


G C 


T 


c 


T 


c 


G 


A 


T 


c 


c 


T 


c 


G 


C 


T 


A 


c 


G 


G 


C 


A 


c 


C 


c 


A 


680 


RDVER3 . SEQ 


T 


G 


c 


G 


T 


C 


T 


G 


A 


T 


C 


c 


A 


T 


G 


c 


T 


c 


T 


c 


G 


A 


T 


c 


c 


A C 


G 


C T 


A 


c 


G 


G 


C 


A 


c 


T 


c 


A 


680 


RDVER4 .'SEQ 


T 


G 


c 


G 


T 


C 


T 


G 


A 


T 


c 


c 


A T 


G 


c 


T 


c 


T 


c 


G 


A 


T 


c 


c 


A C 


G 


C 


T 


A 


c 


G 


G 


C 


A 


c 


T 


c 


A 


680 


RDVER5 . SEQ 


T 


G 


C G 


T 


C 


T 


G 


A 


T 


c 


c 


A T 


G 


c 


T 


c 


T 


c 


G 


A 


T 


c 


c 


A C 


G 


C 


T 


A 


c 


G 


G 


C 


A 


c 


T 


c 


A 


680 


RD7 . SEQ 


T 


G 


C G 


T 


c 


T 


G 


A 


T 


c 


c 


A 


T 


G 


c 


T 


c 


T 


c 


G 


A 


T 


c 


c 


A C 


G 


C T 


A 


c 


G 


G 


C 


A 


c 


T 


c 


A 


680 


RDVER51',SEQ 


T 


G 


c 


G 


T 


c 


T 


G 


A 


T 


c 


c 


A 


T 


G 


c 


T 


c 


T 


c 


G 


A 


T 


c 


c 


A 


c 


G 


C T 


A 


c 


G 


G 


c 


A 


c 


T 


c 


A 


680 


RDVER52 . SEQ 


T 


G 


c 


G 


T 


c 


T 


G 


A 


T 


c 


c 


A 


T 


G 


c 


T 


c 


T 


c 


G 


A 


T 


c 


c 


A C 


G 


C T 


A 


c 


G 


G 


c 


A 


c 


T 


c 


A 


680 


RD1561H9 . SEQ T 


G 


c 


G 


T 


c 


T 


G 


A 


T 


c 


c 


A 


T 


G 


c 


T 


c 


T 


c 


G 


A 


T 


c 


c 


A C 


G 


C T 


A 


c 


G 


G 


c 


A 


c 


T 


c 


A 


680 



GRVER51 . SEQ 


A 


T 


T 


G 


A T 


c[c C T G G 


C 


G 


GR6.SEQ 


A 


T 


T 


G 


A T 


cjrjc T G G 


C 


G 


GRVER5 . SEQ 


A 


T 


T 


G 


A T 


C 


CCT6G 


C 


G 


GRVER4 . SEQ 


A 


T 


T 


G 


A T 


c 


C C T G G 


C 


"G 


GRVER3 . SEQ 


A 


T 


T 


G 


A T 


c 


C C T G G 


C 


G 


GRVER2 . SEQ 


t 


T 


T 


G 


A T 


c 


C C T G G 


C 


G 


GRVERl . SEQ 




T 


T 


G 


A T 


c 


C C T G G 


C 


G 


YG81-6G1.SEQA CTTATTCC. TGGT 


G 


RDVERl. SEQ* 


ACT 


G 


A T 


T 


c c(a)g g t g 


RDVER2.SEQ 


A 


C 


T 


G 


A T 


T 


C C T G G 


T 


G 


RDVER3 . SEQ 


G 


C 


T 


G 


A T 


T 


CCTGGT 


G 


: RDVER4-SEQ 


G 


c 


T 


G 


A T 


T 


C C T G G 


T 


G 


RDVER5 . SEQ 


G 


C T 


G 


A T 


T 


C C T G G 


T 


G 


RD7.SEQ 


G 


C T 


G 


A T 


TCCTGG 


T 


G 


RDVER51.SEQ 


G 


C 


T 


G 


A T 


T C C T G G 


T 


G 


RDVER52. SEQ 


G 


c 


T 


G 


A T 


T 


C C T G G 


T 


G 


RD1561H9.SEQ 




c 


T 


G 


A T 


TCCTGGTG 



T G 


A 


C 


T 


G 


T 


G C 


T 


G 


G 


T 


G 


T 


A 


T 


c 


T 


G 


C 


c 


T 


T 


T 


C 


720 


T 


G 


A 


C 


T 


G 


T 


G 


c 


T 


G 


G 


T 


G 


T 


A T 


C T 


G 


C 


c 


T 


T 


T 


C 


720 


T 


G 


A 


c 


T 


G 


T 


G 


c 


T 


G 


G 


T 


G 


T 


A 


T 


C T 


G 


C 


c 


T 


T 


T 


C 


720 


T 


G 


A 


c 


T 


G 


T 


G C 


T 


G 


G 


T 


G 


T 


A 


T 


C T 


G 


C 


c 


T 


T 


T 


C 


720 


T 


G 


A 


c 


T 


G 


T 


G C 


T 


G 


G 


T 


G 


T 


A T 


T 


T 


G 


c 


c 


T 


T 


T 


c 


720 


T 


G 


A 


c 


T 


G 


T 


C 


c 


T 


G 


G 


T 


G 


T 


A 


C 


T 


T 


G 


c 


c 


A 


T 


T 


c 


720 


T G 


A 


c 


T 


G 


T 


C 


c 


T 


G 


G 


T 


G 


T 


A 


c_ 


_T 


T 


G 


c 


c 


A 


T 


T 


c 


720 


T 


G 


A 


c 


A G 


T 


c 


T 


T 


A 


G 


T 


A T 


A 


T 


C 


T 


G 


c 


c 


T 


T 


T 


T 


720 


T 


C 


A 


c 


C 


G 


T 


G 


T 


T 


G 


G 


T 


C 


T 


A T 


C 


T 


G 


c 


c 


T 


T 


T 


T 


720 


T 


C 


A 


c 


c 


G 


T 


G 


T 


T 


G 


G 


T 


C 


T 


A 


T 


c 


T 


G 


c 


c 


T 


T 


T 


T 


720 


T 


C 


A 


c 


c 


G 


T 


C T T 


G 


G 


T 


C 


T 


A 


|5| 


Is, 


T 


G 


c 


c 


T 


T 


T 


C 


720 


T 


C 


A 


c 


c 


G 


T 


C T T 


G 


G 


T 


C 


T 


A 


c 


T 


T 


G 


c 


c 


T 


T 


T 


C 


720 


T 


C 


A 


c 


c 


G 


T 


C 


T 


T 


G 


G 


T 


C 


T 


A 


c 


T 


T 


G 


c 


c 


T 


T 


T 


C 


720 


T 


C 


A 


c 


c 


G 


T 


C 


T 


T 


G 


G 


T 


C 


T 


A 


c 


T 


T 


G 


c 


c 


T 


T 


T 


C 


720 


T 


C 


A 


c 


c 


G 


T 


C 


T T 


G 


G 


T 


C 


T 


A 


c 


T 


T 


G 


c 


c 


T 


T 


T 


C 


720 


T 


C 


A 


c 


c 


G 


T 


c 


T 


T 


G 


G 


T 


C 


T 


A 


c 


T 


T 


G 


c 


c 


T 


T 


T 


C 


720 


T 


C 


A 


c 


c 


G 


T 


c 


T T 


G 


G 


T 


c 


T 


A 


c 


T 


T 


G 


c 


c 


T 


T 


T 


c 


720 



WO 02/16944 



8/65 



PCT/US01/26566 



Figure 2 (cont.) 



GRVER51 . SEQ 


T 


T 


T 


C A 


C 


G 


C 


c 


T 


T 


T 


G G 


T 


T 


T 


C T C T A T 


T 


A 


C 


c 


C 


T G 


G G 


C 


T 


A 


T 


T 


T 


C A 


7 60 


GR6.SEQ 


T 


T 


T 


C A 


c 


G 


C 


c 


T 


T 


T G G 


T 


T 


T 


C T C T A T 


T 


A 


c 


c 


C 


T G 


G G 


C 


T 


A 


T 


T 


T 


C A 


760 


GRVER5.SEQ 


T 


T 


T 


C A 


c 


G 


C 


c 


T 


T 


T G G 


T 


T 


T 


C T C T A T 


T 


A 


c c 


C 


T G 


G G 


c 


T A 


T 


T 


T 


C A 


7 60 


GRVER4 . SEQ 


T 


T 


T 


C A 


c 


G 


C 


c 


T 


T 


T 


G G 


T 


T 


T 


T 


T C T A T 


T 


A 


c 


c 


C 


T G 


G G 


c 


T 


A 


T 


T 


T 


C 


A 


760 


GRVER3.SEQ 


T 


T 


T 


C A 


c 


G 


c 


c 


T 


TTGG 


T 


T T 


T 


T C T A T 


C 


A 


c 


c 


C 


T G 


G G 


c 


T 


A 


T 


T 


T 


C A 


760 


GRVER2 . SEQ 


T 


T 


T 


C A 


c 


G 


c 


c 


T 


t[c]g G 


T 


T 


T 


T 


TCTAT 


T 


A 


c c 


C 


T G 


G G 


c 


T 


A 


T 


T 


T 


C A 


760 


GRVERl . SEQ 


T 


T 


T 


C A 


c 


G 


c 


c 


T 


T 


C 


G G 


T 


T 


T 


T_ 


T C T A T 


T 


A 


c 


c 


C 


T G 


G G 


c 


T 


A 


T 


T T C A 


760 


YG81-6G1.SEQT 


T C C A T G 


c 


T 


T 


T 


T 


G G 


G 


T 


T 


C 


T C 


T 


A T 


A A 


c 


c 


T 


T G 


G G A 


T 


A 


C 


T 


T 


C 


A 


760 


RDVER1 . SEQ 


T 


T C C A T 


G 


c 


T 


T 


t 


T G G 


C 


T 


T 


C 


C A C 


A T 


C 


A 


c 


T 


T 


T G 


G G 


T 


T 


A C 


T 


T 


T 


A 


7 60 


RDVER2 . SEQ 


T 


T 


C C A T 


G 


c 


T 


T 


T 


T G G 


C 


T 


T 


C 


C A C 


A T 


C 


A 


c 


T 


T 


T G 


G G 


T 


T 


A 


C 


T 


T 


T 


a' 


760 


RDVER3 . SEQ 


T 


T 


C 


CAT 


G 


c 


T 


T 


T 


C 


G G 


c 


T 


T 


C 


C A 


C 


A T 


T 


A 


c 


T 


T 


T G 


G G 


T 


T 


A C 


T 


T 


T 


A 


760 


RDVER4 . SEQ 


T 


T 


C 


CAT 


G 


c 


T 


T 


T 


C 


G G 


c 


T 


T 


C 


C A 


TAT 


T 


A 


c 


T 


T 


T G 


G G 


T 


T 


A C 


T 


T 


T 


A 


"760 


RDVER5 . SEQ 


T 


T 


C C A T 


G C 


T 


T 


T 


c 


G G 


c 


T 


T 


T 


C A 


T 


A T 


T 


A 


c 


T 


T 


T G 


G G 


T 


T 


A C 


T 


T 


T 


A 


760 


RD7.SEQ 


T 


TCCATG 


c 


T 


T 


T 


c 


G G 


c 


T 


T 


T 


C A 


TAT 


T 


A 


c 


T 


T 


T G 


G G 


T 


T 


A C 


T 


T 


T 


A 


760 


RDVER51 . SEQ 


T 


T C 


CAT 


G 


c 


T 


T 


T 


c 


G G 


c 


T 


T 


T 


C A 


T 


A T 


T 


A 


c 


T 


T 


T G 


G G 


T 


T 


A 


C 


T 


T 


T 


A 


760 


RDVER52 - SEQ 


T 


TCCAT 


G 


c 


T 


T 


T 


c 


G G 


c 


T 


T 


T 


C A 


TAT 


T 


A 


c 


T 


T 


T G 


G G 


T 


T 


A 


C 


T 


T 


T 


A 


760 


RD1561H9 . SEQ T 


T C C A T 


G 


c 


T 


T 


T 


c 


G G 


c 


T 


T 


T 


C A 


T 


A T 


Z. 


A 


c 


T 


T 


T G 


G G 


T. 


T 


A C 


T 


T 


T 


A 


760 



GRVER51 . SEQ 


T G G T 


C 


G G 


C 


T 


T 


G 


C G T 


G T 


C 


A T 


C 


A T G T 


T 


T 


c 


G 


T 


c 


G 


C 


T 


T 


C 


G 


A 


C 


C A 


800 


GR6.SEQ 


T G G T 


C 


G G 


C T 


T 


G 


C G T 


G T 


C 


A T 


,C A T G T T 


T 


c 


G 


T 


c 


G 


C 


T 


T 


C 


G 


A 


C 


C A 


800 


GRVER5 . SEQ 


T G G T 


C 


G G 


C 


T 


T 


G 


C G T 


G T 


C 


A T 


C A T G T 


T 


T 


c 


G 


T 


c 


G 


C 


T 


T 


C 


G 


A 


C 


C A 


800 


GRVER4 . SEQ 


T G G T 


c 


G G 


C 


T 


T 


G 


C G T 


G T 


C 


A T 


C 


A T G T 


T 


T 


c 


G 


T 


c 


G 


C 


T 


T 


C 


G 


A 


C 


C A 


800 


GRVER3 . SEQ 


T G G T 


c 


G G 


c_ 


T 


T 


G 


C G 


T 


G T 


G 


A T C A T G T 


T 


T 


c 


G 


T 


c 


G 


c 


T 


T 


C 


G 


A 


C 


C A 


800 


GRVER2 . SEQ 


T G G T 


c 


G G 


T 


T 


T 


G 


C G 


C 


G T 


G 


A T 


C A T G T 


T 


T C 


G 


T 


c 


G 


c 


T 


T 


C 


G 


A T 


C A 


800 


GRVER1 . SEQ 


T G G T 


c 


G G 


T 


T 


T 


G 


C G 


C 


G T 


G 


A T 


C A T G T 


T 


T 


c 


G 


T 


c 


G 


c 


T 


T 


c 


G 


A 


T 


C A 


800 


YG61-6G1. SEQ T GGTGGGTCTTCGT.GTT 


A T 


C 


A T G T 


V 


C A 


G A C 


G 


A 


T 


T 


T 


G 


A 


T 


C A 


800 


RDVER1 . SEQ 


T G G T 


G 


G G 


C 


C T 


G 


C G T 


G T 


C 


A T 


T 


A T G T 


T 


C 


c 


G 


C 


c 


G 


T 


T 


T 


T 


G 


A 


C 


C A 


800 


RDVER2 . SEQ 


T G G T 


G 


G G 


c 


C 


T 


G 


C G T 


G T 


C 


A T 


T 


A T G T 


T 


C 


c 


G 


C 


c 


G 


T 


T 


T 


T 


G 


A 


C 


C A 


800 


RDVER3 . SEQ 


T G G T 


C 


G G 


T 


C 


T 


G 


C G 


T 


G T 


c 


A T 


T 


A T G T 


T 


C 


c 


G 


c 


c 


G 


T 


T 


T 


T 


G 


A 


T 


C A 


800 


RDVER4 . SEQ 


T G G T 


C 


G G T 


c 


T 


G 


C G T 


G T 


G 


A T 


T 


A T G T 


T 


C 


c 


G 


c 


c 


G 


T 


T 


T 


T 


G 


A 


T 


C A 


800 


RDVER5.SEQ 


T G G T 


C 


G G T 


c 


T 


C 


C G 


C 


G T 


G 


A T 


T 


A T G T 


T 


c 


c 


G 


c 


c 


G 


T 


T 


T 


T 


G 


A T 


C A 


800 


RD7 . SEQ 


T G G T 


C 


G G 


T 


c 


T 


C 


C G 


C 


G T 


G 


A T 


T 


AT GT 


T 


c 


c 


G 


c 


c 


G 


T 


T 


T 


T 


G 


A 


T 


C A 


800 


RDVER51.SEQ 


T G G T 


c 


G G 


T 


c 


T 


C 


C G 


C 


G T 


G 


A T 


T 


A T G T 


T 


c 


c 


G 


c 


c 


G 


T 


T 


T 


T 


G 


A 


T 


C A 


800 


RDVER52 . SEQ 


T G G T 


c 


G G 


T 


c 


T 


C 


C G 


C 


G T 


G 


A T 


T 


A T G T 


T 


c 


c 


G 


c 


c 


G 


T 


T 


T 


T 


G 


A T 


C A 


800 


RD1561H9.SEQT G G T 


c 


G G T C T 


c 


C G 


c: 


G T 


G 


A T 


T 


A T G T 


T 


c 


c 


G 


c 


c 


G 


T 


T 


T 


T 


G 


A T 


C A 


800 



GRVER51 . SEQ 
GR6.SEQ 
GRVER5.SEQ 
GRVER4 . SEQ 
GRVER3.SEQ 
GRVER2 - SEQ 
GRVER1 . SEQ 



YG81-6G1.SEQA G A A G 



RDVER1 . SEQ - 
RDVER2 . SEQ 
. RDVER3.SEQ 
RDVER4 , SEQ 
jRDVER5.SEQ 
RD7.. SEQ 
RDVER51 . SEQ 
RDVER52 . SEQ 



RD1561H9.SEQG]G AlGJG C 




T C A 

T C A 

T C A 

T C A 

T C A 

T C A 

T C A G 

T C A G 



c 


C A 


A 


c 


C A 


A 


c 


C A 


A 


c 


C A 


A 


c 


C A 


A 


c 


C A 


A 


c 


C A 


A 


c 


C A 


A 


c 


C A 


A 



G A 

G A 

G A 

G A 

G A 

G A 

G A 

G A 

G A 

G A 

G A 

G A 

G A 

G A 

G A 

G A 

G A 



c 


T A 


C 


G A 


G 


G T 


G 


C 


G 


T 


c 


t A 


C 


G A 


G 


G T 


G 


C 


G 


T 


c 


T A 


C 


G A 


G 


G T 


G 


C 


G 


T 


c 


T A 


C 


G A 


G 


G T 


G 


C 


G 


T 


c 


T A 


C 


G A 


G 


G.T 


G 


c 


G 


T 


c 


T A 


C 


G A 


G 


G T 


C 


c 


G 


T 


c 


T A 


C 


G A 


G 


G T 


C 


c 


G 


T 



T T A T 

T T A T 

T T A T 

T T A T 

T T A T 

T T A T 

T T A T 

T T A T 

T T A T 

T T A T 



G A 

G A 

G A 

G A 

G A 

G A 

G A 

G A 

G A 

G A 



A G 

A G 

A G 

A G 

A G 

A G 

A G 

A G 

A G 

A G 



T T C G A 



C G 

C G 

C G 

C G 

C G 

C G 

C G 

C G 

C G 



840 

840 

840 

840 

840 

840 

840 

840 

840 

840 

840 

840 

840- 

840 

840 

840 

840 



WO 02/16944 



9/65 



PCT/US01/26566 



Figure 2 (cont) 
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Figure 2 (cont.) 
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Figure 2 (cont) 
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Figure 2 (cont.) 
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Figure 3 (cont) 
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Figure 3 (cont.) 
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Figure 3 (cont) 
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figure 3 (cont) 
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Figure 5A 

Codon Usage YG#8l-6G01 (yellow- green) 
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Figure 5B 



Codon Usage: GRverl 
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Figure 5C 



Codon Usage: RDverl 
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Figure 5D 



. Codon Usage: Grver2 
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Asp 


13 


GGT 


Gly 


18 


GTC 


Val 


28 


GCC 


Ala 


19 


GAC 


Asp 


13 


GGC 


Gly 


21 


GTA 


Val 


0 


GCA 


Ala 


0 


GAA 


Glu 


17 


GGA 


Gly 


0 


GTG 


Val 


22 


GCG 


Ala 


0 


GAG 


Glu 


21 


GGG 


Gly 


0 
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Figure 5E 



Codon Usage : Rdver2 



Phe 


13 


TCT 


Ser 


16 


TAT 


Tyr 


10 


TGT 


Cys 


6 


Phe 


12 


TCC 


Ser 


0 


TAC 


Tyr 


10 


TGC 


Cys 


5 


Leu 


0 


TCA 


Ser 


0 


TAA 


* * * 


0 


TGA 


*** 


. 0 


Leu 


27 


TCG 


Ser 


0 


TAG 


*** 


0 


TGG 


Trp 


2 


Leu 


0 


CCT 


Pro 


15 


CAT 


His 


7 


CGT 


Arg 


13 


Leu 


1 


CCC 


Pro 


0 


CAC 


His 


6 


CGC 


Arg 


13 


Leu 


0 


CCA 


Pro 


13 


CAA 


Gin 


8 


CGA 


Arg 


0 


Leu 


27 


CCG 


Pro 


0 


CAG 


Gin 


7 


CGG 


Arg 


0 


He 


19 


ACT 


Thr 


11 


AAT 


Asn 


10 


AGT 


Ser 


0 


He 


20 


ACC 


Thr 


11 


AAC 


Asn 


11 


AGC 


Ser 


14 


He 


0 


ACA 


Thr 


0 


AAA 


Lys 


19 


AGA 


Arg 


0 


Met 


11 


ACG 


Thr 


0 


AAG 


Lys 


16 


AGG 


Arg 


0 


Val 


0 


GCT 


Ala 


19 


GAT 


Asp 


13 


GGT 


Gly 


21 


Val 


21 


GCC 


Ala 


17 


GAC 


Asp 


13 


GGC 


Gly 


18 


Val 


0 


GCA 


Ala 


1 


GAA 


Glu 


21 


GGA 


Gly 


0 


Val 


28 


GCG 


Ala 


0 


GAG 


Glu 


17 


GGG 


Gly 


0 
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Figure 5F 



Codon Usage: GRver3 



TTT 


Phe 


13 


TCT 


Ser 


16 


TAT 


Tyr 


9 


TGT 


Cys 


7 


TTC 


Phe 


12 


TCC 


Ser 


0 


TAC 


Tyr 


10 


TGC 


Cys 


.4 


TTA 


Leu 


0 


TCA 


Ser 


0 


TAA 


*** 


0 


TGA 


*** 


0 


TTG 


Leu 


26 


TCG 


Ser 


0 


TAG 


*** 


0 


TGG 


Trp 


2 


CTT 


Leu 


0 


CCT 


Pro 


18 


CAT 


His 


6 


CGT 


Arg 


14 


CTC 


Leu 


5 


CCC 


Pro 


0 


CAC 


His 


7 


CGC 


Arg 


12 


CTA 


Leu 


0 


CCA 


Pro 


10 


CAA 


Gin 


9 


CGA 


Arg 


0 


CTG 


Leu 


24 


CCG 


Pro 


0 


CAG 


Gin 


5 


CGG 


Arg 


0 


ATT 


He 


14 


ACT 


Thr 


14 


AAT 


Asn 


11 


AGT 


Ser 


0 


ATC 


He 


24 


ACC 


Thr 


8 


AAC 


Asn 


11 


AT3C 


Ser 


15 


ATA 


He 


0 


ACA 


Thr 


0 


AAA 


Lys 


21 


AGA 


Arg 


0 


ATG 


Met 


11 


ACG 


Thr 


0 


AAG 


Lys 


14 


AGG 


Arg 


0 


GTT 


Val 


1 


GCT 


Ala 


18 


GAT 


Asp 


12 


GGT 


Gly 


18 


GTC 


Val 


22 


GCC 


Ala 


18 


GAC 


Asp 


14 


GGC 


Gly 


21 


GTA 


Val 


0 


GCA 


Ala 


1 


GAA 


Glu 


20 


GGA 


Gly 


0 


GTG 


Val 


27 


GCG 


Ala 


0 


GAG 


Glu 


18 


GGG 


Gly 


0 
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Figure 5G 



Codon Usage: RDver3 



TTT 


XT 11C 


1 7 
J. J 


rprirn 


Sex* 


TTC 


Phe 


12 


TCC 


Ser 


TTA 


Leu 


0 


TCA 


Ser 


TTG 


Leu 


27 


TCG 


Ser 


CTT 


Leu 


0 


CCT 


Pro 


CTC 


Leu 


6 


CCC 


Pro 


CTA 


Leu 


0 


CCA 


Pro 


CTG 


Leu 


22 


CCG 


Pro 


ATT 


He 


20 


ACT 


Thr 


ATC 


He 


19 


ACC 


Thr 


ATA 


He 


0 


ACA 


Thr 


ATG 


Met 


11 


ACG 


Thr 


GTT 


Val 


0 


GCT 


Ala 


GTC 


Val 


27 


GCC 


Ala 


GTA 


Val 


0 


GCA 


Ala 


GTG 


val 


22 


GCG 


Ala 



14 


TAT 




7 


TGT 


Cys 


5 


1 


TAC 


Tyr 


13 


TGC 


Cys 


.5 


0 


TAA 


** * 


0 


TGA 


*** 


0 


0 


TAG 


** * 


0 


TGG 


Trp 


2 


16 


CAT 


His 


10 


CGT 


Arg 


16 


0 


CAC 


His 


3 


CGC 


Arg 


10 


12 


CAA 


Gin 


8 


CGA 


Arg 


0 


0 


CAG 


Gin 


7 


CGG 


Arg 


0 


10 


AAT 


Asn 


10 


AGT 


Ser 


0 


12 


AAC 


Asn 


11 


AGC 


Ser 


15 


0 


AAA 


Lys 


13 


AGA 


Arg 


0 


0 


AAG 


Lys 


22 


AGG 


Arg 


0 


20 


GAT 


Asp 


14 


GGT 


Gly 


16 


16 


GAC 


Asp 


12 


GGC 


Gly 


23 


1 


GAA 


Glu 


18 


GGA 


Gly 


0 


0 


GAG 


Glu 


20 


GGG 


Gly 


0 
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Figure 5H 



Codon Usage: GRver4 



TTT 


Phe 


11 


TCT 


Ser 


13 


TAT 


Tvr 


7 


TGT 


Cys 


8 


TTC 


Phe 


14 


TCC 


Ser 


2 


TAC 


Tyr 


12 


TGC 


Cys 


3 


TTA 


Leu 


0 


TCA 


Ser 


1 


TAA 


*** 


0 


TGA 


*** 


. 6 


TTG 


Leu 


21 


TCG 


Ser 


0 


TAG 


*** 


0 


TGG 


Trp 


2 


CTT 


Leu 


1 


CCT 


Pro 


18 


CAT 


His 


7 


CGT 


Arg 


14 


CTC 


Leu 


11 


CCC 


Pro 


0 


CAC 


His 


6 


CGC 


Arg 


11 


CTA 


Leu 


0 


CCA 


Pro 


10 


CAA 


Gin 


11 


CGA 


Arg 


1 


CTG 


Leu 


22 


CCG 


Pro 


0 


CAG 


Gin 


3 


CGG 


Arg 


0 


ATT 


He 


13 


ACT 


Thr 


14 


AAT 


Asn 


11 


AGT 


Ser 


1 


ATC 


He 


25 


ACC 


Thr 


8 


AAC 


Asn 


11 


AGC 


Ser 


14 


ATA 


He 


0 


ACA 


Thr 


0 


AAA 


Lys 


20 


AGA 


Arg 


0 


ATG 


Met 


11 


ACG 


Thr 


0 


AAG 


Lys 


15 


AGG 


Arg 


0 


GTT 


Val 


3 


GCT 


Ala 


19 


GAT 


Asp 


12 


GGT 


Gly 


17 


GTC 


Val 


22 


GCC 


Ala 


15 


GAC 


Asp 


• 14 


GGC 


Gly 


19 


GTA 


Val 


0 


GCA 


Ala 


3 


GAA 


Glu 


20 


GGA 


Gly 


3 


GTG 


Val 


25 


GCG 


Ala 


0 


GAG 


Glu 


18 


GGG 


Gly 


0 
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Figure 51 



Codon Usage: RDver4 



TTT 


Phe 


13 


TCT 


Ser 


11 


TAT 


Tyr 


7 


TGT 


Cys 


7 


TTC 


Phe 


12 


TCC 


Ser 


2 


TAC 


Tyr 


13 


TGC 


Cys 


4 


TTA 


Leu 


0 


TCA 


Ser 


2 


TAA 


*** 


0 


TGA 


*** 


. 0 


TTG 


Leu 


28 


TCG 


Ser 


0 


TAG 


*** 


0 


TGG 


Trp 


2 


CTT 


Leu 


0 


CCT 


Pro 


16 


CAT 


His 


11 


CGT 


Arg 


15 


CTC 


Leu 


7 


ccc 


Pro 


2 


CAC 


His 


2 


CGC 


Arg 


11 


CTA 


Leu 


0 


CCA 


Pro 


10 


CAA 


Gin 


7 


CGA 


Arg 


0 


CTG 


Leu 


20 


CCG 


Pro 


0 


CAG 


Gin 


8 


CGG 


Arg 


0 


ATT 


He 


■ 21 


ACT 


Thr 


11 


AAT 


Asn 


10 


AGT 


Ser 


1 


ATC 


He 


18 


ACC 


Thr 


11 


AAC 


Asn 


11 


AGC 


Ser 


14 


ATA 


He 


0 


ACA 


Thr 


0 


AAA 


Lys 


13 


AGA 


Arg 


0 


ATG 


Met 


11 


ACG 


Thr 


0 


AAG 


Lys 


22 


AGG 


Arg 


0 


GTT 


Val 


3 


GCT 


Ala 


22 


GAT 


Asp 


15 


GGT 


Gly 


14 


GTC 


Val 


27 


GCC 


Ala 


11 


GAC 


Asp 


11 


GGC 


Gly 


21 


GTA 


Val 


0 


GCA 


Ala 


4 


GAA 


Glu 


18 


GGA 


Gly 


4 


GTG 


val 


19 


GCG 


Ala 


0 


GAG 


Glu 


20 


GGG 


Gly 


0 
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Figure 5J 



Codon Usage: GRverS 



TTT 


Phe 


10 


TCT 


Ser 


11 


TAT 


Tyr 


7 


TGT 


Cys 


3 


TTC 


Phe 


15 


TCC 


Ser 


4 


TAC 


Tyx 


12 


TGC 


Cys 


. 3 


TTA 


Leu 


0 


TCA 


Ser 


1 


TAA 


*★* 


0 


TGA 


*** 


0 


TTG 


lieu 


23 


TCG 


Ser 


0 


TAG 


*** 


0 


TGG 


Trp 


2 


CTT 


Leu 


1 


CCT 


Pro 


17 


CAT 


His 


6 


CGT 


Arg 


13 


CTC 


Leu 


12 


CCC 


Pro 


2 


CAC 


His 


7 


CGC 


Arg 


11 


CTA 


Leu 


0 


CCA 


Pro 


9 


CAA 


Gin 


11 


CGA 


Arg 


2 


CTG 


Leu 


19 


CCG 


Pro 


0 


CAG 


Gin 


3 


CGG 


Arg 


0 


ATT 


lie 


15 


ACT 


Thr 


14 


AAT 


Asn 


9 


AGT 


Ser 


1 


ATC 


lie 


23 


ACC 


Thr 


8 


AAC 


Asn 


13 


AGC 


Ser 


14 


ATA 


lie 


0 


ACA 


Thr 


0 


AAA 


Lys 


19 


AGA 


Arg 


0 


ATG 


Met 


11 


ACG 


Thr 


0 


AAG 


Lys 


16 


AGG 


Arg 


0 


GTT 


Val 


3 


GCT 


Ala 


, 18 


GAT 


Asp 


12 


GGT 


Gly 


16 


GTC 


Val 


21 


GCC 


Ala 


14 


GAC 


Asp 


14 


GGC 


Gly 


21 


GTA 


Val 


1 


GCA 


Ala 


5 


GAA 


Glu 


19 


GGA 


Gly 


1 


GTG 


Val 


25 


GCG 


Ala 


0 


GAG 


Glu 


19 


GGG 


Gly 


1 
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Figure 5K 



Codon Usage: RDverS 



TTT 


Phe 


13 


TCT 


Ser 


12 


TAT 


Tyr 


7 


TGT 


Cys 


7 


TTC 


Phe 


12 


TCC 


Ser 


2 


TAC 


Tyr 


13 


TGC 


Cys 


4 


TTA 


Leu 


0 


TCA 


Ser 


2 


TAA 


*** 


0 


TGA 


*** 


0 


TTG 


Leu 


25 


TCG 


Ser 


0 


TAG 


*** 


0 


TGG 


Trp 


2 


CTT 


Leu 


1 


CCT 


Pro 


15 


CAT 


His 


9 


CGT 


Arg 


14 


CTC 


Leu 


11 


CCC 


Pro 


1 


CAC 


His 


4 


CGC 


Arg 


12 


CTA 


Leu 


0 


CCA 


Pro 


12 


CAA 


Gin 


7 


CGA 


Arg 


0 


CTG 


Leu 


18 


CCG 


Pro 


0 


CAG 


Gin 


8 


CGG 


Arg 


0 


ATT 


lie 


19 


ACT 


Thr 


10 


AAT 


Asn 


9 


AGT 


Ser 


2 


ATC 


He 


20 


ACC 


Thr 


11 


AAC 


Asn 


. 12 


AGC 


Ser 


12 


ATA 


He 


0 


ACA 


Thr 


1 


AAA 


Lys 


13 


AGA 


Arg 


0 


ATG 


Met 


11 


ACG 


Thr 


0 


AAG 


Lys 


22 


AGG 


Arg 


0 


GTT 


Val 


5 


GCT 


Ala 


21 


GAT 


Asp 


14 


GGT 


Gly 


14 


GTC 


Val 


26 


GCC 


Ala 


12 


GAC 


Asp 


12 


GGC 


Gly 


21 


GTA 


Val 


1 


GCA 


Ala 


4 


GAA 


Glu 


18 


GGA 


Gly 


3 


GTG 


Val 


17 


GCG 


Ala 


0 


GAG 


Glu 


20 


GGG 


Gly 


1 
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Figure 6 



Synthetic oligos for engineered GR/RD genes 
(All oligos listed 5 'to 3') 

Coding strand: 5' { )n 3' 

Non-coding strand: 3' ( )n 5' 

Oligos with pRAM flanking sequence identical for GR/RD 

1) coding strand upstream flanking 

RAM-C1: ACGCCAGCCCAAGCTTAGGCCTGAGTGGC (SEQ ID NO: 35) 

RAM-C2: CTTAATTCTCCCCATCCCCCTGTTGACAATTAATCATCGGCTCG (SEQ ID NO:36) 

RAM-C3: TATAATGTGAGGAATTGCGAGCGGATAACAATTTCACACA (SEQ ID NO: 37) 

2) coding strand downstream flanking 

RAM-C4; ATGGGATGTTACCTAGACCAATATGAAATATTTGGTAAAT (SEQ ID NO: 38) 

RAM-C5: AAATGCTTAATGAATTTCAAAAAAAAAAAAAAAGGAATTC (SEQ ID NO: 3 9) 

RAM-C6: GATATCAAGCTTATCGATACCGTCGACCTCGAGGATTATA (SEQ ID NO: 40) 

RAM-C7: TAGAAAAAGGCCTCGGCGGCCGCTAGTTCAGTCAGTT (SEQ ID NO: 41) 

3) non- coding strand downstream flanking 

RAM-N1: AACTGACTGAACTAGCG (SEQ ID NO: 42) 

RAM-N2: GCCGCCGAGGCCTTTTTCTATATAATCCTCGAGGTCGACG (SEQ ID NO: 43) 

RAM-N3: GTATCGATAAGCTTGATATCGAATTCCTTTTTTTTTTTTT (SEQ ID NO: 44) 

RAM-N3b : AGCTTGATATCGAATTCCTTTTTTTTTTTTTTTGAAATTC (SEQ ID NO:45) 

RAM-N4: TTGAAATTCATTAAGCATTTATTTACCAAATATTTCATAT (SEQ ID NO: 46) 

RAM-N5 : TGGTCTAGGTAACATCCCATCACTAGCTTTTTTTTCTATA (SEQ ID NO: 47) 

4) non -coding strand upstream flanking 

RAM-N6: TCGCAATTCCTCACATTATACGAGCCGATGATTAATTGTC (SEQ ID NO:48) 
RAM-N7 : AACAGGGGGATGGGGAGAATTAAGGCCACTCAGGCCTAAGCTTGGGCTGGCGT 

t (SEQ ID NO:49) 

GRverS with flanking seq. of pRAM to end of Sfi I primers 
1) Coding strand (Start and stop codons are underlined) 

GR-C1: GGAAACAGGATCCCAIGATOAAACGCGAAAAGAACGTGAT (SEQ ID NO: 50) 

GR-C2: CTACGGCCCAGAACCACTGCATCCACTGGAAGACCTCACC (SEQ ID NO: 51) 

GR-C3: GCTGGTGAGATCCTCTTCCGAGCACTGCGTAAACATAGTC (SEQ ID NO: 52) 

GR-C4: ACCTCCCTCAAGCACTCGTGGACGTCGTGGGAGACGAGAG (SEQ ID NO: 53) 

GR-C5: CCTCTCCTACAAAGAATTTTTCGAAGCTACTGTGCTGTTG (SEQ ID NO: 54) 

GR-C6: GCCCAAAGCCTCCATAATTGTGGGTACAAAATGAACGATG (SEQ ID NO: 55) 

GR-C7: TGGTGAGCATTTGTGCTGAGAATAACACTCGCTTCTTTAT (SEQ ID NO: 56) 

GR-C8: TCCTGTAATCGCTGCTTGGTACATCGGCATGATTGTCGCC (SEQ ID NO: 57) 

GR-C9 : CCTGTGAATGAATCTTACATCCCAGATGAGCTGTGTAAGG (SEQ ID NO: 58) 

GR-C10 : TTATGGGTATTAGCAAACCTCAAATCGTCTTTACTACCAA (SEQ ID NO: 59) 

GR - CI 1 : AAACATCTTGAATAAGGTCTTGGAAGTCCAGTCTCGTACT (SEQ ID NO: 60) 

GR-C12 : AACTTCATCAAACGCATCATTATTCTGGATACCGTCGAT^A (SEQ ID NO: 61) 

GR-C13 : ACATCCACGGCTGTGAGAGCCTCCCTAACTTCATCTCTCG (SEQ ID NO: 62) 

OR" CI 4 : TTACAGCGATGGTAATATCGCTAATTTCAAGCCCTTGCAT (SEQ ID NO: 63) 

GR-C15 : TTTGATCCAGTCGAGCAAGTGGCCGCTATTTTGTGCTCCT (SEQ ID NO: 64) 

GR - CI 6 : CCGGCACCACTGGTTTGCCTAAAGGTGTCATGCAGACTCA (SEQ ID NO: 65) 

GR-C17 : CCAGAATATCTGTGTGCGTTTGATCCACGCTCTCGACCCT (SEQ ID NO: 66) 

GR-C18 : CGTGTGGGTACTCAATTGATCCCTGGCGTGACTGTGCTGG (SEQ ID NO:67) 

GR-C19 : TGTATCTGCCTTTCTTTCACGCCTTTGGTTTCTCTATTAC (SEQ ID NO: 68) 

GR-C2 0 : CCTGGGCTATTTCATGGTCGGCTTGCGTGTCATC^TGTTT (SEQ ID NO: 69) 
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Figure 6 (Cont.) 



GR-C21 : CGTCGCTTCGACCAAGAAGCCTTCTTGAAGGCTATTCAAG 


(SEQ 


ID 


NO 


70) 


GR-C22 : ACTACGAGGTGCGTTCCGTGATCAACGTCCCTTCAGTCAT 


(SEQ 


ID 


NO 


71) 


GR-C23 : TTTGTTCCTGAGCAAATCTCCTTTGGTTGACAAGTATGATCTG 


(SEQ 


ID 


NO 


72) 


GR-C24 : AGCAGCTTGCGTGAGCTGTGCTGTGGCGCTGCTCCTT 


(SEQ 


ID 


NO 


73) 


GR-C25 : TGGCCAAAGAAGTGGCCGAGGTCGCTGCTAAGCGTCTGAA 


(SEQ 


ID 


NO 


74) 


GR-C2 6 : CCTCCCTGGTATCCGCTGCGGTTTTGGTTTGACTGAGAGC 


(SEQ 


ID 


NO 


75) 


GR-C27 : ACTTCTGCTAACATCCATAGCTTGCGAGACGAGTTTAAGT 


(SEQ 


ID 


NO: 


76) 






ID 


NO 


77 J 


GR-C2 9 : GATCGCCGACCGTGAGACCGGCAAAGCACTGGGCCCAAAT 


(SEQ 


ID 


NO- 


78) 


GR-C3 0 : CAAGTCGGTGAATTGTGTATTAAGGGCCCTATGGTCTCTA 


(SEQ 


ID 


NO: 


79) 


GR-C3 1 : AAGGCTACGTGAACAATGTGGAGGCCACTAAAGAAGCCAT 


(SEQ 


ID 


NO: 


80) 


GR-C32 : TGATGATGATGGCTGGCTCCATAGCGGCGACTTCGGTTAC 


(SEQ 


ID 


NO: 


81) 


GR-C33 : TATGATGAGGACGAACACTTCTATGTGGTCGATCGCTACA 


(SEQ 


ID 


NO: 


82) 


GR-C34 : AAGAATTGATTAAGTACAAAGGCTCTCAAGTCGCACCAGC 


(SEQ 


ID 


NO: 


83) 


uK-Ljd : LoAAt 1\5G AAGAAA 1 1TTG C TG AAGAAC C CTTG TAT CCGC 


(SEQ 


ID 


NO: 


84) 




( CT70 




NO: 


85) 


GR - C3 7 : AGTTGCCTAGCGCCTTTGTGGTGAAACAACCCGGCAAGGA 


(SEQ 


ID 


NO: 


86) 


GR-C38 :GATC^CTGCTAAGGAG^TCTACGACTATTTGGCCGAGCGC 


(SEQ 


ID 


NO: 


87) 


GR-C3 9 : GTGTCTCACACCAAATATCTGCGTGGCGGCGTCCGCTTCG 


(SEQ 


ID 


NO: 


88) 


GR-C40 : TCGATTCTATTCCACGCAACGTTACCGGTAAGATCACTCG 


(SEQ 


ID 


NO: 


89) 


GR-C4 1 : TAAAGAGTTGCTGAAGCAACTCCTCGAAAAAGCTGGCGGC 


(SEQ 


ID 


NO: 


90) 


GR-C42 : TAGTAAAGTCTTCATGATTATATAGAAAA A A AAfirTAGTG 


(SEQ 


ID 


NO: 


91) 


2) non- coding strand 










GR-N1 : TAATCATGAAGACTTTACTAGCCGCCAGCTTTTTCGAGGA 


(SEQ 


ID 


NO: 


92) 


GR-N2 : GTTGCTTCAGCAACTCTTTACGAGTGATCTTACCGGTAAC 


(SEQ 


ID 


NO: 


93) 


GR-N3 : GTTGCGTGGAATAGAATCGACGAAGCGGACGCCGCCACG 


(SEQ 


ID 


NO:94) 


GR-N4 : CAGATATTTGGTGTGAGACACGCGCTCGGCCAAATAGTCGT 


(SEQ 


ID 


NO: 


95) 


GR - N5 : AGACCTCCTTAGCAGTGATCTCCTTGCCGGGTTGTTTCAC 


(SEQ 


ID 


NO: 


96) 


GR-N6 : CACAAAGGCGCTAGGCAACTCGCCAGCTTCCAAGTCTGGG 


(SEQ 


ID 


NO: 


97) 


GR-N7 : ATACCCACGACGGCCACGTCGCGGATA 


(SEQ 


ID 


NO: 


98) 


GR-N8 : GCAAAATTTCTTCCAGTTCGGCTGGTGCGACTTGAGAGCC 


(SEQ 


ID 


NO: 


99) 


GR-N9 : TTTGTACTTAATCAATTCTTTGTAGCGATCGACCACATAG 


(SEQ 


ID 


NO: 


100) 


GR - Nl 0 : AAGTGTTCGTCCTCATCATAGTAACCGAAGTCGCCGCTAT 


(SEQ 


ID 


NO: 


101) 


GR-N11 : GGAGCCAGCCATCATCATCAATGGCTTCTTTAGTGGCCTC 


(SEQ 


ID 


NO: 


102) 


GR-N12 : CACATTGTTCACGTAGCCTTTAGAGACCATAGGGCCCTTA 


(SEQ 


ID 


NO: 


103) 


GR-N13 : ATACACAATTCACCGACTTGATTTGGGCCCAGTGCTTTGC 


(SEQ 


ID 


NO: 


104) 


GR-N14 : CGGTCTCACGGTCGGCGATCTTTGCAGCCATAAGAGGAGT , 


(SEQ 


ID 


NO:105) 


GR-N15 : CACGCGACCCAGGCTACCAGACTTAAACTCGTCTCGCAAG 


(SEQ 


ID 


NO:106) 


GR-N16 : CTATGGATGTTAGCAGAAGTGCTCTCAGTCAAACCAAAAC 


(SEQ 


ID 


NO: 


107) 


GR-N17 : CGCAGCGGATACCAGGGAGGTTCAGACGCTTAGCAGCGAC 


(SEQ 


ID 


NO: 


108) 


GR-N18 : CTCGGCCACTTCTTTGGCCAAAGGAGCAGCGCCACAGCAC 


(SEQ 


ID 


NO: 


109) 


GR-N19 : AGCTCACGCAAGCTGCTCAGATCATACTTGTCAACCAAAG 


(SEQ 


ID 


NO: 


110) 


GR-N2 0 : GAGATTTGCTCAGGAACAAAATGACTGAAGGGACGTTGAT 


(SEQ 


ID 


NO: 


111) 


GR-N21 : CACGGAACGCACCTCGTAGTCTTGAATAGCCTTCAA. 


(SEQ 


ID 


NO: 


112) 


GR-N22 : GAAGGCTTCTTGGTCGAAGCGACGAAACATGATGACACGCAAGC (SEQ 


ID 


NO: 113) 


GR-N23 X : CX3ACCATGAAATAGCCCAGGGTAATAGAGAAACCAAAGGC 


(SEQ 


ID 


NO: 


114) 


GR-N24 : GTGAAAGAAAGGCAGATACACCAGCACAGTCACGCCAGGG 


(SEQ 


ID 


NO: 


115) 


GR-N25 : ATCAATTGAGTACCCACACGAGGGTCGAGAGCGTGGATCA 


(SEQ 


ID 


NO: 


116) 


GR-N26 : AACGCACACAGATATTCTGGTGAGTCTGCATGACACCTTT 


(SEQ 


ID 


NO: 


117) 


GR-N27 : AGGCAAACCAGTGGTGCCGGAGGAGCACAAAATAGCGGCC 


(SEQ 


ID 


NO: 


118) 
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GR-N28 : ACTTGCTCGACTGGATCAAAATGCAAGGGCTTGAAATTAG (SEQ ID NO: 119) 

GR-N29 : CGATATTACCATCGCTGTAACGAGAGATGAAGTTAGGGAG (SEQ ID NO: 120) 

GR-N3 0 : GCTCTCACAGCCGTGGATGTTTTCGACGGTATCCAGAATA (SEQ ID NO: 121) 

GR-N3 1 : ATGATGCGTTTGATGAAGTTAGTACGAGACTGGACTTCCA (SEQ ID NO: 122) 

GR -N3 2 : AGACCTTATTCAAGATGTTTTTGGTAGTAAAGACGATTTG (SEQ ID NO: 123) . 

GR-N3 3 : AGGTTTGCTAATACCCATAACCTTACACAGCTCATCTGGG (SEQ ID NO : 12 4) ' 

GR-N3 4 : ATGTAAGATTCATTCACAGGGGCGACAATCATGCCGATGT (SEQ ID NO: 125) 

GR-N3 5 : ACCAAGCAGCGATTACAGGAATAAAGAAGCGAGTGTTATT (SEQ. ID NO: 12 6) 

GR-N3 6 : CTCAGCACAAATGCTCACCACATCGTTCATTTTGTACCC^ (SEQ ID NO: 12 7) 

GR-N37 : CAATTATGGAGGCTTTGGGCCAACAGCACAGTAGCTTCGA (SEQ ID NO: 12 8) 

GR-N3 8 : AT^AATTCTTTGTAGGAGAGGCTCTCGTCTCCCACGACGTC (SEQ ID NO: 129) 

GR-N39 : CACGAGTGCTTGAGGGAGGTGACTATGTTTACGCAGTGCT (SEQ ID NO: 130) 

GR-N40 : CGGAAGAGCATCTCACCAGCGGTGAGGTCTTCCAGTGGAT (SEQ ID NO: 131) 

GR-N4 1 : GCAGTGGTTCTGGGCCGTAGATCACGTTCTTTTCGCGTTT (SEQ ID NO: 13 2) 

GR-N42 : CATCATG GGATCCTGTTTCCTGTGTGAAATTGTTATCCGC (SEQ ID NO:133) 

RDverS with flanking sequence of pRAM to end of Sfi I primers 
1) coding strand 

RD-C1: GQAAACAGGATCCC ATGATG AAGCGTGAGAAAAATGTCAT (SEQ ID NO: 134) 

RD-C2: CTATGGCCCTGAGCCTCTCCATCCTTTGGAGGATTTGACT (SEQ ID NO: 13 5) 

RD-C3 : GCCGGCGAAATGCTGTTTCGTGCTCTCCGCAAGCACTCTC (SEQ ID NO: 13 6) 

RD-C4: ATTTGCCTCAAGCCTTGGTCGATGTGGTCGGCGATGAATC (SEQ ID NO: 13 7) 

RD-C5: TTTGAGCTACAAGGAGTTTTTTGAGGCAACCGTCTTGCTG (SEQ ID NO: 13 8) 

RD-C6: GCTCAGTCCCTCCACAATTGTGGCTACAAGATGAACGACG (SEQ ID NO: 13 9) ' 

RD-C7: TCGTTAGTATCTGTGCTGAAAACAATACCCGTTTCTTCAT (SEQ ID NO: 14 0) 

RD-C8: TCCAGTCATCGCCGCATGGTATATCGGTATGATCGTGGCT (SEQ ID NO: 141) 

RD-C9: CCAGTCAACGAGAGCTACATTCCCGACGAACTGTGTAAAG (SEQ ID NO: 142) 

RD-C10 : TCATGGGTATCTCTAAGCCACAGATTGTCTTCACCACTAA (SEQ ID NO: 143) 

RD-C11 : GAATATTCTGAACAAAGTCCTGGAAGTCCAAAGCCGCACC (SEQ ID NO: 144) 

RD-C12 : AACTTTATTAAGCGTATCATCATCTTGGACACTGTGGAGA (SEQ ID NO:145) 

RD-C13 : ATATTCACGGTTGCGAATCTTTGCCTAATTTCATCTCTCG (SEQ ID NO: 14 6) 

RD-C14 : CTATTCAGACGGCAACATCGCAAACTTTAAACCACTCCAC (SEQ ID NO: 147) 

RD-C15 : TTCX3ACCCTGTGGAACAAGTTGCAGCCATTCTGTGTAGCA (SEQ ID NO: 14 8) 

RD-C16 : GCGGTACTACTGGACTCCCAAAGGGAGTCATGCAGACCCA (SEQ ID NO: 14 9) 

RD-C17 : TCAAAACATTTGCGTGCGTCTGATCCATGCTCTCGATCCA (SEQ ID NO: 150) 

■ RD-C18 : CGCTACGGCACTCAGCTGATTCCTGGTGTCACCGTCTTGG (SEQ ID NO: 151) 

RD-C19 : TCTACTTGCCTTTCTTCCATGCTTTCGGCTTTCATATTAC (SEQ ID NO: 152) 

RD - C2 0 : TTTGGGTTACTTTATGGTCGGTCTCCGCGTGATTATGTTC (SEQ ID NO:153) 

RD-C21 : CGCCGTTTTGATCAGGAGGCTTTCTTGAAAGCCATCCAAG (SEQ ID NO: 154) 

RD - C2 2 : ATTATGAAGTCCGC AGTGTCATCAACGTGCCTAGCGTGAT (SEQ ID NO: 155) 

RD - C2 3 : CCTGTTTTTGTCTAAGAGCCCACTCGTGGACAAGTACGAC (SEQ ID NO: 156) 

RD-C24 : TTGTCTTCACTGCGTGAATTGTGTTGCGGTGCCGCTCCAC .{SEQ ID NO: 157) 
RD-C25 : TGGCTAAGGAGGTCGCTGAAGTGGCCGCCAAACGCTTGAA . (SEQ ID NO: 15 8) 

RD -C2 6 : TCTTCCAGGGATTCGTTGTGGCTTCGGCCTCACCGAATCT (SEQ ID NO: 159) 

RD - C2 7 : ACCAGCGCTATTATTCAGTCTCTCCGCGATGAGTTTAAGA (SEQ ID NO: 160) 

RD-C2 8 : GCGGCTCTTTGGGCCGTGTCACTCCACTCATGGCTGCTAA (SEQ ID NO: 161) 

RD-C2 9 : GATCGCTGATCGCGAAACTGGTAAGGCTTTGGGCCCTAAC (SEQ ID NO: 162) 

RD-C3 0 : CAAGTGGGCGAGCTGTGTATCAAAGGCCCTATGGTGAGCA (SEQ ID NO: 163) 

RD - C3 1 : AGGGTTATGTCAATAACGTCGAAGCTACCAAGGAGGCCAT (SEQ ID NO: 164) 

RD - C3 2 : CGACGACGACGGCTGGTTGCATTCTGGTGATTTTGGATAT (SEQ ID NO: 165) 

RD - C 3 3 : TACGACGAAG ATG AGCATTTTTACGTCGTGGATCGTTACA (SEQ ID NO: 166) 

RD - C3 4 : AGGAGCTGATCAAATAC AAGGGTAGCCAGGTTGCTCCAGC (SEQ ID NO: 167) 

RD-C35:TGAGTTGGAGGAGATTCTGTTGAAAAATCCATGCATTCGC (SEQ ID NO: 168) 
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RD - C3 6 : GATGTCGCTGTGGTCGGCATTCCTGATCTGGAGGCCGGCG (SEQ ID NO: 169) 

RD-C37 : AACTGCCTTCTGCTTTCGTTGTCAAGCAGCCTGGTAAAGA (SEQ ID NO: 170) 

RD-C38 : AATTACCGCCAAAGAAGTGTATGATTACCTGGCTG AACGT (SEQ ID NO: 171) 

RD-C39 : GTGAGCCATACTAAGTACTTGCGTGGCGGCGTGCGTTTTG (SEQ ID NO: 172) 

RD-C40:TTGACTCCATCCCTCGTAACGTAACAGGCAAAATTACCCG (SEQ ID NO: 173) 

RD-C4 1 : CAAGG AG CTGTTGAAACAATTGTTGGAGAAGGCCGGCGGT (SEQ ID NO: 174) 

RD-C42 : TAGTAA AGTCTTCATGATTATATAGAAAAAAAAGCTAGTG (SEQ ID NO: 175) 



2) 


non 


-coding strand 










RD 


-Nl: 


TAATCATGAAGACTTTACTAACCGCCGGCCTTCTCCAACA 


(SEQ 


ID 


NO: 


176) 


RD 


-N2: 


ATTGTTTCAACAGCTCCTTGCGGGTAATTTTGCCTGTTAC 


(SEQ 


ID 


NO: 


177) 


RD 


-N3: 


GTTACGAGGGATGGAGTCAACAAAACGCACGCCGC C ACGC 


(SEQ 


ID 


NO: 


178) 


RD 


-N4: 


AAGTACTTAGTATGGCTCACACGTTCAGCCAGGTAATCAT 


(SEQ 


ID 


NO: 


179) 


RD 


-N5: 


ACACTTCTTTGGCGGTAATTTCTTTACCAGGCTGCTTGAC 


(SEQ 


ID 


NO: 


180) 


RD 


-N6: 


AACGAAAGCAGAAGGCAGTTCGCCGGCCTCCAGATCAGGA 


(SEQ 


ID 


NO: 


181) 


RD 


-N7: 


ATGCCGACCACAGCGACATCGCGAATGCATGGATTTTTCA 


(SEQ 


ID 


NO: 


182) 


RD 


-N8: 


ACAGAATCTCCTCCAACTCAGCTGGAGCAACCTGGCTACC 


(SEQ 


ID 


NO: 


183) 


RD 


-N9: 


CTTGTATTTGATCAGCTCCTTGTAACGATCCACGACGTAA 


(SEQ 


ID 


NO: 


184) 


RD 


-N10 


: AAATGCTCATCTTCGTCGTAATATCCAAAATCACCAGAAT 


(SEQ 


ID 


NO: 


185) 


RD 


-Nl 1 : GCAACCAGCCGTCGTCGTCGATGGCCTCCTTGGTAGCTTC 


(SEQ 


ID 


NO: 


186) 


RD 


-N12 : GACGTTATTGACATAACCCTTGCTCACCATAGGGCCTTTG 


(SEQ 


ID 


NO: 


187) 


RD 


-N13 


: ATACACAGCTCGCCCACTTGGTTAGGGCCCAAAGCCTTAC 


(SEQ 


ID 


NO: 


188) 


RD 


-N14 


: CAGTTTCGCGATCAGCGATCTTAGCAGCCATGAGTGGAGT 


(SEQ 


ID 


NO: 


189) 


RD 


-N15 


: GACACGGCCCAAAGAGCCGCTCTTAAACTCATCGCGGAGA 


(SEQ 


ID 


NO: 


190) 


RD 


-N16 


: GACTGAATAATAGCGCTGGTAGATTCGGTGAGGCCGA 


(SEQ 


ID 


NO: 


191) 


RD 


-N17 


: AGCCACAACGAATCCCTGGAAGATTCAAGCGTTTGGCGGCCAC ( SEQ 


ID 


NO:192) 


RD 


-N18 


: TTCAGCGACCTCCTTAGCCAGTGGAGCGGCACCGCAACAC 


(SEQ 


ID 


NO: 


193) 


RD 


-N19 


: AATTCACGCAGTGAAGACAAGTCGTACTTGTCCACGAGTG 


(SEQ 


ID 


NO: 


194) 


RD 


-N20 


: GGCTCTTAGACAAAAACAGGATCACGCTAGGCACGTTGAT 


(SEQ 


ID 


NO: 


195) 


RD 


-N21 


: GACACTGCGGACTTCATAATCTTGGATGGCTTTCAAGAAA 


(SEQ 


ID 


NO: 


196) 


RD 


-N22 


: GCCTCCTGATCAAAACGGCGGAACATAATCACGCGGAGAC 


(SEQ 


ID 


NO: 


197) 


RD 


-N23 


: CGACCATAAAGTAACCCAAAGTAATATGAAAGCCGAAAGC 


(SEQ 


ID 


NO: 


198) 


RD 


-N24 : ATGGAAGAAAGGCAAGTAGACCAAGACGGTGACACCAGGA 


(SEQ 


ID 


NO: 


199) 


RD 


-N25 


: ATCAGCTGAGTGCCGTAGCGTGGATCGAGAGCATGGATCA 


(SEQ 


ID 


NO: 


200) 


RD 


-N26 : GACGCACGCAAATGTTTTGATGGGTCTGCATGACTCCCTT 


(SEQ 


ID 


NO: 


201) 


RD 


-N27 


: TGGGAGTCCAGTAGTACCGCTGCTACACAGAATGGCTGCA 


(SEQ 


ID 


NO: 


202) 


RD 


-N28 


: ACTTGTTC CACAGGGTCGAAGTGGAGTGGTTTAAAGTTTG 


(SEQ 


ID 


NO: 


203) 


RD 


-N2 9 : CGATGTTGCCGTCTGAATAGCGAGAGATGAAATTAGGCAA 


(SEQ 


ID 


NO: 


204) 


RD. 


-N30 : AGATTCGCAACCGTGAATATTCTCCACAGTGTCCAAGATG 


(SEQ 


ID 


NO: 


205) 


RD 


-N31 


: ATGATACGCTTAATAAAGTTGGTGCXjGCTTTGGACTTCCA 


(SEQ 


ID 


NO: 


206) 


RD 


-N32 


: GGACTTTGTTC AGAATATTCTTAGTGGTGAAGACAATCTG 


(SEQ 


ID 


NO: 


207) 


RD 


-N33 


: TGGCTTAG AGATACC CATGACTTTACACAGTTCGT CGGGA 


(SEQ 


ID 


NO: 


208) 


RD 


-N34 


: ATGTAGCTCTCGTTGACTGGAGCCACGATCATACCGATAT 


(SEQ 


ID 


NO: 


209) 


RD 


-N35 


: ACCATGCGGCGATGACTGGAATGAAGAAACGGGTATTGTT 


(SEQ 


ID 


NO: 


210) 


RD 


-N36 


: TTCAGCACAGATACTAACGACGTCGTTCATCTTGTAGCCA 


(SEQ 


ID 


NO: 


211) 


RD 


-N37 


: CAATTGTGGAGGGACTGAGCCAGCAAGACGGTTGCCTCAA 


(SEQ 


ID 


NO: 


212) 


RD 


-N3 8 : AAAACTCCTTGTAGCTCAAAGATTCATCGCCGACCACATC 


(SEQ 


ID 


NO: 


213) 


RD 


-N39 


: GACCAAGGCTTGAGGCAAATGAGAGTGCTTGCGGAGAGCA 


(SEQ 


ID 


NO: 


214) 


RD 


-N40 


: CGAAACAGCATTTCGCCGGCAGTCAAATCCTCCAAAGGAT 


(SEQ 


ID 


NO: 


215) 


RD-N4 1 : GGAGAGGCTCAGGGCCATAGATGACATTTTTCTCACGCTT 


(SEQ 


ID 


NO: 


216) 


RD 


-N42 


: CM^MIGGGATCCTGTTTCCTGTGTGAAATTGTTATCCGC 


(SEQ 


ID 


NO:217) 
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RELLUC.SEQ ATGACTTCGAAAGTTTATGAT CCAGAAC A A A G GAAACGGA 40 



RLUCVERl 


. SEQ A T G 


G 


C 


T 


T 


C 


c 


A A 


G 


G T 


G 


T A 


C 


G 


A 


C 


C 


RLUCVER2 


. SEQ A T G 


G 


C 


T 


T 


C 


c 


A A 


G 


G T 


G 


T A 


C 


G 


A 


C 


c 


RLUCFINL 


-SEQ A T G 


G 


c 


T 


T 


C 


c 


A A 


G 


G T 


G 


T A 


C 


G 


A 


C 


c 



G A 
G A 
G A 



c a|gT|g 

C A A 
C A A 



A a(g]c G 
A A A C G 
A A A C G 



A 40 
A 40 
A 40 



RELLUC.SEQ 
RLDCVER1 . SEQ T GAT 
RLUCVER2 . SEQ T GAT 
RLUCFINL . SEQ T GAT 



TGATAACTGGTCCGCAGTGGTGGGCCAGATG TA A A C A A A T 



80 



A c[c]g G 
A C T G G 
A C T G G 



C C 
C C 
C C 



CAGTGGTGGG C cjc 
CAGTGGTGGGCTC 
CAGTGGTGGG c |t C 



G 


C 


T 


G 


C 


A A 


G 


c a[g]a t 


80 


G 


C 


T 


G 


C 


A A 


G 


C A A A T 


80 


G 


C 


T 


G 


c 


A A 


G 


C A A A T 


80 



RELLUC.SEQ G A A 


T 


G 


T 


T 


C 


T 


T 


G A 


T 


T 


C 


A 


T 


T 


T 


A T 


T 


A A 


T 


T A 


T 


TATGATTCAG 


A A 


120 


RLUCVERl . SEQ G A A 


C 


G 


T 


G 


C 


T 


G 


G A 


C 


T 


C 


c 


T 


T 


c 


A T 


c 


A A 


C 


T A 


C 


t 'aIcIg a|c A G C 


G 


A 


G 


120 


RLUCVER2 . SEQ G A A 


C 


G 


T 


G 


C 


T 


G 


G A 


C 


T 


C 


c 


T 


T 


c 


A T 


c 


A A 


C 


T A 


C 


TATGATTC 


C 


G 


A 


G 


120 


RLUCFINL. SEQ G A A 


C 


G 


T 


_G 


C 


T 


G 


G A 


C 


T 


C 


c 


T 


T 


c 


A T 


C 


A A 


C 


T A 


C 


TATGATTC 


£ 


G 


A 


G_ 


120 



RELLUC . SEQ AAACATGCAGAAAATGCTGTTATTTT T T TACATGGTAACG 160 



RLUCVERl. SEQ A A 


G 


C A 


C 


G 


C 


C 


G A 


G 


A A 


C 


G C 


C 


G T 


G 


A t[c]t T[C~ 


C 


T 


G 


C A[C 


G G 


\c\a A 


c 


G 


160 


RLUCVER2 . SEQ A A 


G 


C A 


C 


G 


C 


C 


G A 


G 


A A 


C 


G 


C 


c 


G T 


G 


A T 


T T 


T 


T 


C 


T 


G 


CAT 


G G T A A 


c 


G 


160 


RLUCFINL. SEQ A A 


jG 


C A 


C 


G 


C 


C 


G A 


G 


A A 


C 


G 


C 


c 


G T 


G 


A 


T 


T T 


T 


T 


C 


T 


G 


CAT 


G G T A A 


c 


G 


160 


RELLUC.SEQ C 


G 


G 


C C 


T 


C 


T 


T 


C T 


T 


A T 


T 


T 


A T 


G G 


C 


G AC AT 


G 


T 


T 


G 


T G C 


C 


A C A T 


A 


T 


200 


RLUCVERl. SEQ C 


C 


G 


C C 


T 


C 


C 


A 


G C 


T 


A 


C 


C 


T 


G 


T 


G G 


A 


G 


G 


C A 


C 


G 


T 


G 


G 


T G C 


C 


T 


C A 


C 


A 


T 


200 


RLUCVER2 . SEQ C 


T 


G 


C C 


T 


C 


C 


A 


G C 


T 


A 


C 


C 


T 


G 


T 


G G 


A 


G 


G 


C A 


C 


G 


T 


C 


G T G C 


C 


T 


C A 


C 


A 


T 


200 


RLUCFINL. SEQ C 


T 


G 


C C 


T 


C 


C 


A 


G C 


T 


A 


C 


C 


T 


G 


T 


G G 


A 


G 


G 


C A 


C 


G 


T 


C 


G T G C 


C 


T 


C A 


C 


A 


T 


200 



RELLUC . SEQ TGAGCCAGTAGCGCGGTGTATT^AT^ACCA^GA^CT^AT^GG^ 
RLUCVERl - SEOfcj G A G C C 
RLUCVER2 . SEQ C G A G C C 
RLUCFINL. SEQIc G A G C C 



C 


G 


T 


G 


G C 


C 


C G 


C 


T 


G 


C 


A T 


C 


A T 


C 


C 


C 


T 


G A[c]c T 


G 


A T 


G 


G 


G 


C 


C 


G 


T 


G 


G C 


T 


C G 


C 


T 


G 


C 


A T 


c 


A T 


C 


C 


C 


T 


G A T C T 


G 


A T 


c 


G 


G 


A 


C 


G 


T 


G 


G C 


T A| G 


A 


T 


G 


_C 


A T 


c 


A T 


C 


C 


C 


T 


G A T C T 


G 


A T 


c 


G 


G 


A 



RELLUC.SEQ ATGGGCAAATCAGGCA A A T C T G GTAATGGTTCTT A T A G G T 



RLUCVERl . SEQ A T G G G C A A 


G 


T 


C 


C 


G G 


C 


A A 


GAG 


C 


G G 


C 


a a[c|g g 


C 


T 


C 


C 


T A[C 


C 


G 


C 


C 


RLUCVER2 . SEQ A T G G G 


T 


A A 


G 


T 


C 


C 


G G 


C 


A A 


GAG 


C 


G G 


G 


A A T G G 


C 


T 


C 


A 


TAT 


C 


G 


C 


C 


RLUCFINL . SEQ A T G G G 


T 


A A 


G 


T 


C 


C 


G G 


C 


A A 


GAG 


C 


G G 


G 


A A T G G 


C 


T 


C 


A 


TAT 


C 


G 


C 


C 



240 
240 
240 
240 

280 
280 
280 
280 



RELLUC.SEQ TACTTGATCATTACAAATATCTTACTGCATGGTTTGAACT 320 



RLUCVERl . SEQ T 


G 


C T 


G 


g a[cJc A 


C 


T AC A A 


G 


T A 


C 


C 


T 


G 


A C 


C 


G C 


c 


T G G T 


T 


C 


G A 


G 


C 


T 320 


RLUCVER2 . SEQ T 


C 


CT 


G 


G A T C A 


C 


T A C A A 


G 


T A 


C 


c 


T 


c 


A C 


c 


G C 


T 


T G G T 


T 


C 


G A 


G 


C 


T 320 


RLUCFINL. SEQ T 


C 


C T 


G 


G A T C A 


C 


T A C A A 


G 


T A 


c 


c 


T 


c 


A C 


c 


G C 


T 


T G G T 


T 


C 


G A 


G 


c 


T 320 



RELLUC . SEQ TCTTAATTTACCAAAGAAGATCATTTTTGTCGGCCATGAjC 
RLUCVERl . SEqJg^ 
RLUCVER2 . SEQ G 



C T 
C T 

RLUCFINL. SEQlGjC T 



A A 
A A 
A A 



C C 
C C 
C C 



cc(c]aagaagatcat 



360 



CCAAAGAA 
CCAAAGAA 



ATCAT 
A T C A T 



T t[c]g T 
T T T G T 
TTTGT 



G G C C A 
G G C C A 
GGCCA 



C 


G A 


C 


360 


C 


G A 


C 


360 


c 


G A 


C 


360 



RELLUC . SEQ TGGGGTGCTTGTTTGGCATTTCATTATAGCTAT^GAGCAT^C 
RLUCVERl . SEQ T G G G g|a|g c(c]t g\c cIt G G C [c T T \E\ C A 
RLUCVER2 . SEQ T GGGGGGCTTGTCTGGCCTTTCA 
RLUCFINL. SEQ T GGGGGGCTTGTCTGGCCTTTCA 



400 



C 


T A" 


C 


T 


C 


C T A 


C 


G A G C A 


C 


c 


400 


c 


T A 


C 


T 


C 


C T A 


C 


GAGCA 


C 


c 


400 


c 


T A 


C 


T 


C 


C T A 


C 


G A G C A 


c 


c 


400 



RELLUC.SEQ A A G A T A A G A T C A A A G C A A T A G T T C A C G CTG AJ\_ A GTGTAGT 440 
RLUCVERl . SEQ A [g] G A 
RLUCVER2 . SEQ A A G A 
RLUCFINL . SEQ A AG A 



AAGATCAA 
AAGATCAA 
AAGATCAA 



G 


G 


C 


C 


A T 


c 


G 


T 


G 


C A C 


G c(c]g A 


G 


A g[c]g T 


G 


G 


T 440 


G 


G 


C 


C 


A T 


C 


G 


T 


c 


C A 


T 


G C T G A 


G 


A G T G T 


C 


G 


T 440 


G 


G 


C 


c 


A T 


c 


G 


T 


C 


C A 


T 


GCT'GA 


G 


A G T G T 


c 


G 


T 440 
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RELLUC. SEQ A G A T G T 6 AT TGA A T C A T G GG AT G AAT G 6 CC T 6 AT AT T 6 A A 
RLUCVERl. SEQG 
RLUCVER2.SEQG 
RLUCFINL. SEQG 



G 


A 


C 


G 


T 


G 


A T 


C 


G A 


G 


T C 


C 


T 


G 


G 


G A 


C 


G A 


G 


T 


G G 


C 


C 


T 


G A 


C 


A T 


C 


G A 


G 


480 


G 


A 


C 


G 


T 


G 


A T 


C 


G A 


G 


T C 


C 


T 


G 


G 


G A 


C 


G A 


G 


T 


G G 


C 


C 


T 


G A 


C 


A T 


C 


G A 


G 


400 


G 


A 


C 


G 


T 


G 


A T 


c 


G A 
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Figure 7 (Cont) 
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Figure 9A 

Codon usage in RELLUC 

(Renilla reniformis; Genbank ACCESSION:M63501; Medline:91239583) 
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Figure 9B 

Codon Usage in Rluc-final 
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Figure 10 

Oligonucleotides for the assembly of synthetic Renilla luciferase gene 



Sense Strand 
Oligo name 

RLSl (1-40) 
RLS2 (41-80) 
RLS3 (81-120) 
RLS4 (121-170) 

RLS5 (171-210) 
RLS6 (211-250) 
RLS7 (25 1-290) 
RLS8 (291-330) 
RLS9 (331-370) 
RLS10(37M10) 
RLSll(41M50) 
RJLS12 (451-495) 
RLSl 3 (496-535) 
RLS14 (536-575) 
RJLS 15 (576-620) 
RLSl 6 (621-660) 
RLS17 (661-700) 
RLSl 8 (701-740) 
RLSl 9 (741-780) 
RLS20 (781-820) 
RLS21 (821-860) 
RLS22 (861-900) 
RLS23 (901-949) 

Anti-sense Strand 

Oligo name 
RLAS1 (1-29) 
RLAS2 (30-69) 
RLAS3 (70-109) 
RLAS4 (110-149) 
RLAS5 (150-189) 
RLAS6 (190-229) 
RLAS7 (230-269) 
RLAS8 (270-309) 
RLAS9 (3 10-349) 
-RLAS10 (350-394) 
RLAS 11 (395^34) 
RLAS12 (435-474) 
RLAS 13 (475-517) 
RLAS 14 (518-559) 
RLAS15 (560-599) 
RLAS16 (600-639) 
RLAS 17 (640-679) 
RLAS 18 (680-719) 
RLAS 19 (720-764) 
RLAS20 (765-804) 
RLAS21 (805-849) 
RLAS22 (850-889) 
RLAS23 (890-929) 
RLAS24 (930-949) 



Oligo sequence from 5' to 3' 

AACCATGGCTTCCAAGGTGTACGACCCCGAGCAACGCAAA (SEQ ID NO:246) 

CGCATGATCACTGGGCCTCAGTGGTGGGCTCGCTGCAAGC (SEQ ID NO:247) 

AAATGAACGTGCTGGACTCCTTCATCAACTACTATGATrC (SEQ ID NO:248) 
CGAGAAGCACGCCGAGAACGCCGTGATTTTTCTGCATGGTAACGCTGCCT 

(SEQ ID NO:249) 

CCAGCTACCTGTGGAGGCACGTCGTGCCTCACATCGAGCC (SEQ ID NO:250) 

CGTGGCTAGATGCATCATCCCTGATCTGATCGG AATGGGT (SEQ ID NO:25 1 ) 

AAGTCCGGCAAGAGCGGGAATGGCTCATATCGCCTCCTGG (SEQ ID NO:252) 

ATCACTACAAGTACCTCACCGCTTGGTTCGAGCTGCTGAA (SEQ ID NO:253) 

CCTTCCAAAGAAAATCATCTTTGTGGGCCACGACTGGGGG (SEQ ID NO:254) 

GCTTGTCTGGCCTTTCACTACTCCTACGAGCACCAAGACA (SEQ ID NO:255) 

AGATCAAGGCCATCGTCCATGCTGAGAGTGTCGTGGACGT (SEQ ID NO:256) 

GATCGAGTCCTGGGACGAGTGGCCTGACATCGAGGAGGATATCGC (SEQ ID NO:257) 

CCTGATCAAGAGCGAAGAGGGCGAGAAAATGGTGCTTGAG (SEQ ID NO:258) 

AATAACTTCTTCGTCGAGACCATGCTCCCAAGCAAGATCA (SEQ ID NO:259) 

TGCGGAAACTGGAGCCTGAGGAGTTCGCTGCCTACCTGGAGCCAT (SEQ ID NO:260) 

TCAAGGAGAAGGGCGAGGTTAGACGGCCTACCCTCTCCTG (SEQ ID NO:261) 

GCCTCGCGAGATCCCTCTCGTTAAGGGAGGCAAGCCCGAC (SEQ ID NO:262) 

GTCGTCCAGATTGTCCGCAACTACAACGCCTACCTTCGGG (SEQ ID NO:263) 

CCAGCGACGATCTGCCTAAGATGTTCATCGAGTCCGACCC (SEQ ID NO:264) 

TGGGTTCTTTTCCAACGCTATTGTCGAGGGAGCTAAGAAG (SEQ ID KO:265) 

TTCCCTAACACCGAGTTCGTGAAGGTGAAGGGCCTCCACT (SEQ ID NO:266) 

TCAGCCAGGAGGACGCTCCAGATGAAATGGGTAAGTACAT (SEQ ID NO:267) 
CAAGAGCTTCGTGGAGCGCGTGCTGAAGAACGAGCAGTAATTCTAGAGC 

(SEQIDNO:268) 

Oligo Sequence from 5* to 3' 

GCTCTAGAATTACTGCTCGTTCTTCAGCA (SEQ ID NO:269) 

CGCGCTCCACGAAGCTCTTGATGTACTTACXCATTTCATC (SEQ ID NO:270) 

TG G AGCGTCCTCCTGGCTG AAGTGG AGGCCCTTCACCTTC (SEQ ID NO:271) 

ACGAACTCGGTGTTAGGGAACTTCTTAGCTCCCTCGACAA (SEQ ID NO:272) 

TAGCGTTGGAAAAGAACCCAGGGTCGGACTCGATGAACAT (SEQ ID NO:273) 

CTTAGGCAGATCGTCGCTGGCCCGAAGGTAGGCGTTGTAG (SEQ ID NO:274) 

TTGCGGACAATCTGGACGACGTCGGGCTTGCCTCCCTTAA (SEQ ID NO:275) 

CGAGAGGGATCTCGCGAGGCCAGGAGAGGGTAGGCCGTCT (SEQ ID NO:276) 

A ACCTCG CCCTTCTCCTTG AATGGCTCC AGGTAGGC AGCG (SEQ ID NO:277) 

AACTCCTCAGGCTCCAGTTTCCGCATGATCTTGCTTGGGAGCATG (SEQ ID NO:278) 

GTCrCGACGAAGAAGTTATTCTCAAGCACCATTTTCTCGC (SEQ ID NO:279) 

CCTCTTCGCTCTTGATCAGGGCGATATCCTCCTCGATGTC (SEQ ID NO:280) 

AGGCCACTCGTCCCAGGACTCGATCACGTCCACGACACTCTCA (SEQ ID NO:281) 

GCATGGACGATGGCCTTGATCTTGTCTTGGTGCTCGTAGGAG (SEQ ID NO:282) 

TAGTGAAAGGCCAGACAAGCCCCX^CAGTCGTGGCCCACAA • (SEQ ID NO:283) 

AGATGATTTTCTTTGGAAGGTTCAGCAGCTCGAACCAAGC (SEQ ID NO:284) 

GGTGAGGTACTTGTAGTGATCCAGGAGGCGATATGAGCCA (SEQ ID NO:285 

TrCCCGCTCTTGCCGGACTTACCCATTCCGATCAGATCAG (SEQ ID NO:286) 

GGATGATGCATCTAGCCACGGGCTCGATGTGAGGCACGACGTGCC (SEQ ID NO:287) 

TCCACAGGTAGCTGGAGGCAGCGTTACCATGCAGAAAAAT (SEQ ID NO:288) 

CACGGCGTTCTCGGCGTGCTTCTCGGAATCATAGTAGTTGATGAA (SEQ ID NO:289) 

GGAGTCCAGCACGTTCATTTGCTTGCAGCGAGCCCACCAC (SEQ ID NO:290) 

TGAGGCCCAGTGATCATGCGTTTGCGTTGCTCGGGGTCGT (SEQ ID NO:291) 

ACACCTTGGAAGCCATGGTT (SEQ ID NO:292) 
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Figure 11 ! • 

GRVER51.SEQ A T G A T G A a|ac] G |c]g A [a] A a(g] A A0G T [gJ A T [c] T A[cJ G G [c] C c[S] G A A C 40 
LUCPPLYG. SEQ A TGATGAAGAGAGAGAAAAATGTTATATATG G A C C C G A AC 40 
RD1561H9.SBQAT G A t(a]a A g(c]g[t]g AGAAAAATG t(H]a t(5]t A T G g[c]c c(t]g a[§]c 40 

GRVER51.SEQ C0C t[g]c A0C c|ac]t G G A A G A c[c]t[c]a c(cJg C0G G@G a(g|a T G C T 80 
LUCPPLYG. SEQC C C T A C A C C C C T T G G A A G A C T T A A C A G C A G G A G A A A T G C T 80 

rdi561h9.seqc|t|c t|c]c a|t]c c(t]t t g g a(g]g a[t]t t|g]a c(t]g c(cJg g[c]g AAATGCT 80 

GRVER51. SEQ C T T c[c]g(a)g C0C t[£)c G0A A A C A t[ag)t C a(ccJt(c]c cgc a[a]g C0 120 
LUCPPLYG.SEQC T T C A G G G C C C T T C G A A A A C A T T C T C A T T T A C C G C A G G C T 120 
RD1561H9.SE(iG]T t[tc]g[t]g C0C t|5|c g(c]a a[g]c a(c]t C T C A T T t(g]c c[t)c aIa|g clcj 120 

GRVER51.SEQ [c] T [5| 6 t(g]g A0G tEUtHIg G@G A C G A lG A G C| C T 0 T C C T a(H]a A A G 160 
LUCPPLYG.SEQT TAGTAGATGTGTTTGGTGACGAAT C GC T TIC C T ATA A AG 160 
RD1561H9. SEQT T^G t(c]g A T G T g[g]t[c]g G0G A0G A A T C |i_T] T |G A G| C T a|c]a a[gJg 160 

GRVER51 . SEQ A@T TTTT0GAAGCTA C lT G T QI C t[gt]t[g]g c[E]c A A A g(c|c T C C A 200 
LUCPPLYG. SEQ A GTTTTTTGAAGCTA C ATG C CTCCTAGCGC A A A G T C T C C A 200 
RD1561H9.SEQ A GTTTTTTG a£]g c|a)a C IcTtI C F1 T [g1 C t|g]g C0C a[g T C c|c T C C A 200 

GRVER51 . SEQ {tJaATTGTG g[g]t A C A a(a| A ? G A A@G A T G #T G|A G C|A T0T g(t] 240 
LUCPPLYG. SEQC AATTGTGGATACAAGATGAATGATG TAG T G T C G A T C T G C 240 
RD1 561H9 . SEQ C A A T T G T G g[HJt A C A A G A T G A A© G a(c]g t(c]g T [TA G T|A T C T g|tJ 240 

GRVER51 . SEQ G c|t] G A G A A T A A [c] a{cTc]g 0T T @ T T T A T T C cEl * E A T 0 S "cEJ G 280 
LUCPPLYG. SEQ GC CGAGAATAAT A A A A G ATTTTTTATTC CCAItA TIG C A G 280 
RD1 5 61H9 . SEQ G C0G a(a]a a(Ua A T A lc~C~c1 G [¥) T T@T T@A TIC c(ag]t(c]a T0G clc]G 280 

GRVER51 . SEQ CIIGGI A0A t(c|g g(H]a T G A T T G t[c]g C0C C T G t(gJa A T G A a[tc] 320 
LUCPPLYG . SEQ C TTGGTATATTGGTATGATTGTAGCAC C T G T T A ATG AAA G 320 
RD1561H9.SEQc(a)t G G T A T A t|g G T A T G A I0G i|g C@C C[a]g T@A a[c]gaIg|a G 320 

GRVER51.SEQ TTACATCCCAGATG a£]c t|g]t G T A A G G T0A T G G G T A Tjf A G C) 360 
LUCPPLYG. SEQT I A C AT C C C AG ATG A AC I C I G I A AG G T C AT G G G I AT A I C G 3G0 
RD1561H9.SEq[c]t ACAT@Cc|c]g a|c]g AACT§TGTA a(a]g TCATGGGTAT |cj T C |Tj 360 

GRVER51.SEQ A A A C c|t] C A A A T [c] G T [3]t T T (ac]t A C |c]a a(a) A A C A T [c] T t|g) A A T A 400 
LUCPPLYG . SEQ A AACCACAAATAGTTT I T T G T ft C A A A G A AC A T T T T A A ATA 400 
RD1561H9 . SEQ A a(g) C C A C a[g]a t[?]g T (HJt T lCACCl A C@A A G A a|t]a T t[c|t[g)a a[cJa 400 

GRVER51 . SEQ A G G T0T T G G a{a]g t[c]c A G lT C T C| g[t1a CIA a|c]t T C A t(c]a A A0G 440 
LUCPPLYG . SEQ A GGTATTGGAGGTACAGAGCA G A A C T A ATT T C A T A A A AA G 440 
RD1561H9.SEQA0G 101 G G A@G T@C A0A G c[c]g[c]a c[c]A A jcjT t[t]a t[t|a aIgcJg 440 

GRVER51.SEQ 0A T C A T0A T@C t[g]g ATA c(c|g t|g A A A A C A T0C A C G G0T G T 480 
LUCPPLYG . SEQ G ATCAT CATACTTGATACTGTAG AAA AC A T A C A C G G T T G T 480 
RD1561H9.SE0[t]a T C A T C A t[ct]t|g1g a|c]a C T G t|g]g a[g]a A0A T [t] C A C G G T T g(cJ 480 

GRVER51. SEQ G a(g] A g(c]c T^C C0A A^T T@A T@T C T C G T T ft|C AG C|G ATG G@A 520 
LUCPPLYG.SEQG AAAGTCTTCCCAATTTTATTTCTC G T T ATT C G G ATG G A A 520 
RD1561H9.SEQG A A (Fcl T [t] T \§\ C C@A A T T tJcJa t|H1t C T C g|c]t AT T c[a]G a[c]g g|cJa 520 

GRVER51 . SEQ ATA t[£]g'c[tJa A@T T C A a[g]c C@T t(g)c A T t[tt1g A T C c[a]g T0G A 560 
LUCPPLYG. SEQ A TATTGCCAACTTCAAACCTTT A C ATT AC G A T C C T G T T G A 560 
RD1561H9.SEQaIc1a t(c]g c(a]a ACT T@ft A A C cIaJItIcIc a|c]t(t]c G a[c]c C T G t(gJG A 560 
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Figure 11 (Cont) 

GRVER51.SEQ GCAAGTGG c[c]g C T A t[t]t t[g]t g(c]t C [c] T cIcJg G C A C0A C T G G [tJ 6Q0 
LUCPPLYG . SEQ G CAAGTGGCAGCTA T C T T A T G T TCGTC A G GCACT ACTGG A 600 
RD1561H9.SEq(aJc A A G T@6 C AG c[c]a t(Fc]t(g]t G T |a G C A G cI g g(t]a CTACTGGA 600 

GRVER51.SEQ T t[£)c C0A AAGGTG t[c]a T G C a[g)a CTCACC a(g]a ATA T [c]t GTG 640 
LUCPPLYG . SEQ T TACCGAAAGGTGTAATGCAAACTCACCAAAATATTTGTG 640 
RD1561H9.SEq[c]t[c]c cJaJa a£]g G0G t[c|a T G C a[g]a c(c]c A0C AAA a[c]a T T T g[c]g 640 

GRVER51.SEQ t[g)c g[lt]t[g]a t(c]c a[c]g C t[c]t(c]g AC C C [Fc] G [?] G fTc] G g(t]a C0C A 680 
LUCPPLYG. SEQ T CCGACTTATACATGCTTTAGACC C C A G G G C A G GAACG CA 680 
RD1561H9.SEQt[g]c G0C T0A t[c]c A T G C t(c]t(c]g a[t]c C [ac! G fcTTcl G g(c]a C0CA 680 

GRVER51.SEQ a(t]t[g]a T [c] C C T G g\c\g T G A c[t]g t |g cI tIgIg t[g]t ATCTGCCTT T [c] 720 
LOCPPLYG. SEQ A CTTATTCCTGGTGTGACAGTCTTAGTAT A T C T GCCTTTT 720 
RD1561H9.SEQ(G]C t(g|a TTCCTGGTG T0A C0G T C T t[g]g t[c1t a|c t|t G C C T T T[c] 720 

GRVER51.SEQ T t{t]c A [c] G C0T T T G G (?) T T C T C T A T [t] A [c] C [c] T G G g[c]t a[?]t T C A 760 
LOCPPLYG. SEQ TT CCATGCTTTTGGGT T CTC T ATA A AC T TGGGAT ACTTCA 760 
RD1561H9.SEQT TCCATGCTT T [c] G g[c]t t |t C aI t A t[t]a 1c t| t T G G G0T ACT t(t]a 760 

GRVER51.SEQ T G G T \c\ G g |c t! t[g1c GTG t[c]a T C A T G T t [t c| g[t1c g[c]t t[c)g A^C A 800 
LOCPPLYG . SEQ T GGTGGGTCTTCGTGTTATCATGT T A A G ACGATTTGATCA 800 
RD1561H9.SEQT G G t[c]g G T C t[c]c G0G t[g]a T0A T G T t |c c[ g[c1c g(t]t T T G A T C A 800 

GRVER51 . SEQ A G A A G c{c\t t |c t! t[g]a a[g]g C T A T T C a[a)g A0T A0G a[g]g t(g)c G [t] 840 
LOCPPLYG . SEQ A G A A G C A T T T C T AAAAGCTATTCAGGATTATGAAGTTCGA 840 

rdi561h9.seq(g]g a[g]g c(tJt t [c t! t[g]a a a g c(cJa t(c]c a(a]g attatgaag t[c]c g[c1 840 

GRVER51.SEQ It C c| g t(g]at[c1aACGt[cIc c |t Ti c A(ilT[cjA T0T T G T T c[c!t |g A G~c| a 880 
LOCPPLYG . SEQ A G T G T A AT TAACGTTC C A G C A A T A A T AT T GTTCTTATCGA 880 
RD1561H9. SEQ A GTG t[c]a t[c]a A C G t[g]c c [T A G C G| t[g)a t |C c[ t G T T@T tJg)t'c[t]a 880 

GRVER51.SEQ A A |t C| t CCTTTGGTTGACA a[g]t a[?]g A t[c!t |g A G~c1 a g[c]t t [g c[ g 920 
LOCPPLYG. SEQ A A A G T C C TT T GGTTGACAAATACGATTTAT C AAGTT T AA G 920 

rdis61h9.seqa[g]a gJcJc c |a c| t[c]g t[g|g a c a a[g]t a c g a[c]t t[g]t c |t t c a c1 t |g c| g 920 

GRVER51.SEQ \t\g a [g c| t G T g[c]t G0G G0G c[t]g C0C c[t)t T [gJ G c[cJa A A G A A G t[g] 960 

locpplyg. seq ggaattgtgttgcggtgcggcaccattagcaaaagaagtt 960 
rdis6ih9.seq(t]g aattgtgttgcggtg c[c]g c(t}c c a[c]t[g1g c(t1a a^]g a[g|g t[c] 960 

GRVER51 . SEQ G c[H]g A G G T0G c[t1g 1c T| A a[g]c g [t c| t[g|a A c[c)t(c]c c[?]g g[t]a T[cjc 1000 
LUCPPLYG. SEQ G CTGAGGTTGCA G T A A AACGATTAAACTTGCCAGGAATTC 1000 
RD1561H9.SEQG C T G A0G t(g]g c[c1g [c c[ A A A C g(c]t t[g]a a |t cI tFIc C A G g(g|a T T C 1000 

GRVER51.SEQ G C T g(c]g G [t] T TTGGTTTGA c[t)g a Ig.A G d A C T T c[t]g C T A A0A T 1040 
LUCPPLYG . SEQ G CTGTGGATTTG G TT T GACAGAATCTA C T T C A G C T A A T A T 1040 
RD1561H9 . SEQ G [t] T GTG g[c]t t[c]g g |c c| t[c1a C^G A A T C T A c |c A G Tl G c[g]a1t1t A T 1040 

GRVER51.SEQ [c]c a[t]a g |c t! t|g c|g(a!g a(c]g a{g]t T T A a(g]t c(t]g G |T A G Cl c t[g]g G0 1080 
LOCPPLYG . SEQ A CACAGTCTTGGGGATG A A T T T A A A T C A G GAT C AC T T G G A 1080 
RD156lH9.SEc(c]c a[g]a[c1t C t(c|g GGGATG a[g)t T T A A |G A G c| g g(c]t c |t t| t[g]g g[c] 1080 

GRVER51.SEQ [c]g[c]g t(g]a CTCC t[c]t(t]a T G G c[t]g c(a)a a[g]a t[c]g C0G a[cT]g(t]g 1120 
LUCPPLYG. SEQ A GAGTTACTCC T T TAATGGCAGCTAAAATAGCAGATAGGG 1120 
RD1561H9.SE0(c]g[t|g t[c]a CTC cIaTItIcIa T G G c[t]g C T A a[g]a T0G C0G A t[c]g[c]g H20 
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Figure 1 1 (Cont) 

GRVER51 . SEQ a(g)a c\c\g G0A A A G C a[c]t G G G [c] C CAAATCAAG T0G G T G A A T T 1160 

LUCPPLYG.SEQA AACTGGTAAAGCATTGGGACCAAATCAAGTTGGTG A AT T 1160 

RD1561H9.SEQA AACTGGTA a(g]g c[t]t T G G g(c)c c[g]a a[c]c A A G t[g]g g[c]g a [g C| t 1160 

GRVER51.SEQ [g]t G |t a| t T A a\g\g g(c]c c(?]a T G G t[c]t c(t]a A A G g(c]t ACGTGAAC 1200 

LUCPPLYG.SEQA T G CG T TAAAGGTCCCATGG T A T C G A AAGGTTACGTGAAC 1200 

rdi561h9.sec^g)t g |t a| t[c]a a a g g[c]c c[t]a t g g t Ig a g c| a a(g]g g t t a[t]g t[c|a A0 1200 

GRVER51.SEQ A A T G t(g]g a(g]g c(c|a c[t]a A A G A A G c\E\a TTGATGATGATG G0T 1240 

LUCPPLYG.SEQA ATGTAGAAGCTACCAAAGAAGCTAT TGATGAT GATGGT.T 1240 

RD1561H9 . SEQ A A0G T [t] G AAGCTACCA a[£]G'A[g]g c[c]a t(5]g a[c]g a[c]g a[c]g G [c] T 1240 

GRVER51 . SEQ G G C t|c]c a |t A G C| G g[c]g ACT t[c]g g[t]t ACTATGATGAGG a\c\g A 1280 

LUCPPLYG . SEQ G GCTTCACTCTGGAGACTTTGGATACTATGATGAGGATGA 1280 

RD1561H9 - SEQ G g[t]t[g]c A0T C T G g[t]g a[t]t T T G G A T A0T a[c]g a[c]g a[a)g A T G A 1280 

GRVER51 . SEQ [a]c A0T TCTATGTGG t[c|g A0C g[c]t A C A a[a]g AATTGATTA a[g] 1320 

LUCPPLYG . SEQ G CATTTCTATGTGGTGGACCGTTACAAGG A AT T G A T T A A A 1320 

RD1561H9 . SEQ G C A T T t(t]t A0G t\c\g T G G a(t]c GTTACAAGG a |g c| t G A t[c]a A A 1320 

GRVER51 . SEQ T a[c]a a[a]g G C T C T C A0G t[c]g C A C c(a]g c(c]g A A C T (g| G A A G a[a)a 1360 

LUCPPLYG . SEQ T A T A A G G G C T C T C AGGTAGCACCTGCAG A AC T AGAAGAGA 1360 

RD1561H9.SEQT a[c]a A G G g [t A G c\ c A G G T0G c(t]c C0G C0G a |g t| t[g1g a[g]g A G A 1360 

GRVER51 . SEQ T T T t |g c| t G A a(g]a a[c]c C [t] T G T A T c(c]g[c1g a[c]g t(g]g c[c]g T0G T 1400 

LUCPPLYG. SEQT TTTATTGAAAAATCCATGTA T C A G AGATGTTGCTGTGGT 1400 

RD1561H9 . SEQ T t[c]t[g)t T GAAAAAT C CAT g[c]a t |t c! g[c1g ATGt[c]gCTGTGGT 1400 

GRVER51. SEQ [g]g G T A t[3]c c(a]g a [c t! t[g|g A A G C T G G [c] G a |g tI t G C c |t A G c| G C \E\ 1440 

LUCPPLYG. SEQT GGTATTCCTGATCTAGAAGCTGGAGAACTGCCATCTGCG 1440 

RD1561H9.SE0(c]g g(c]a TTCCTGATC T [g] G a\g\g c[c]g g(c]g A A C T G C c(t]t C T G c[t] 1440 

GRVER51. SEQ T T T G T G G T0A A A C A0C C C G g(c]a A G G A G A t[c]a C [t] G C T A a[g]g 1480 

LUCPPLYG. SEQT TTGTGGTTAAACAGCCCGGA A AG G AGATTACAGCTAAAG 1480 

RD1561H9 - SEQ T t[c]g t[t]g t[c]a a[gJc A G C C0G g[t]a ]c a[ g A0A T T A C [c] G c[c]a A A G 1480 

GRVER51 .SEQ a(g]g t[E]t A C G a[H]t A t[t]t[gJg C C G A g[c]g[c]g t[g]t c[t]c a[c]a cfcjA A 1520 

LUCPPLYG.SEQA AGTGTACGATTATCTTGCCG A G A G G G T CTC C CATACAAA 1520 

RD1561H9 . SEQ A A G T G T a[t]g ATT a|H|c T [g] G c[?]g a [a c| g[t1g t |g A Gj C C AT A c(t]a A 1520 

GRVER51. SEQ [a] T A t{c]t G C G T G g(H]g g[c]g t[c]c g[c]t T C G T [c] G A t It C t| a t[t]c C A 1560 

LUCPPLYG. SEQ G TATTTGCGTGGAGGGGTTCGATTCGTTG A TAG C A T A C C A 1560 

RD1561H9 . SEQ G T a[c]t T G C G T G g[c]g g(5]g T [g] C g(t]t T0G T T G a |C T C] c A t[c]c C [t] 1560 

GRVER51 . SEQ [c]g[c]a a[c]g T T A c[c]g G T A A0A t[c]a c |t c| g[t|a a(a]g a [g t| t(g1c T G A 1600 

LUCPPLYG.SEQA GGAATGTTACAGGTAAAATTA C A A G A A A G G A A CT T C T G A 1600 

RD1561H9.SEq[c]g(t]a a[c]g t[a]a C A G g(c]a.A A A T T A c jc c1 g[c]a A G G a(g]c T |g t| t G A 1600 



GRVER51 . SEQ A G C A lA c! t[c1c T0G a(a]a a |a G c| t |g G C G G ~c| 
LUCPPLYG.SEQA GCAGTTGCTGGAGAA G AGTTCTAAA C T T 
RD1561H9.SEQa[a]c a(aJt T g[t]t G G@G A A g IgCCGGCGGt] 



1626 
1629 
1626 
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Figure 12 



GRVER51.SEQ MMKREKNVIYGPEPLHPLEDLTAGEMLFRALRKHSHLPQA 118 
LUCPPLYG. SEQM MKREKNVIYGPEPLHPLEDLTAGEMLFRALRKHSHLPQA 118 
RD1561H9.SEQM[l]K REKNVIYGPEPLHPLEDLTAGEMLFRALRKHSHLPQA 118 

GRVER51 . SEQ LVD V0G DESLSYKEFFEA T [v] L LAQSLHNCGYKMNDVVSIC 238 
LUCPPLYG. SEQLVDVFGDESLSYKEFFEATCLLAQSLHNCGYKMNDVVSIC 238 
RD1561H9. SEQ L V D V0G DESLSYKEFFEA T@L LAQSLHNCGYKMNDVVSIC 238 

GRVER51 - SEQ A E N N0R F F I p(v)l AAWYIGMIVAPV N'E SYIPDELCKVMGIS 358 
LUCPPLYG . SEQ A ENNKRFFI PI IAAWYIGMIVAPVNESYI PDELCKVMG IS 358 
RD1561H9 . SEQ A E N N@R F F I p(v)l AAWYIGMIVAPVNESYI PDELCKVMG i's 358 

GRVER51 . SEQ K P Q I V f[t]t KNILNKVLEVQSRTNFIKRIIILDTVENIHGC 478 
LUCPPLYG . SEQ K PQIVFCTKNILNKVLEVQSRTNFIKRIIILDTVENIHGC 478 
RD1561H9 . SEQ K P Q I V f[?]t KNILNKVLEVQSRTNFIKRIIILDTVENIHGC 478 

GRVER51 . SEQ ESLPNFISRYSDGNIANFKPL h[¥]d PVEQVAAILCSSGTTG 598 

598 
598 



LUCPPLYG. SEQ ESLPN FISRYSDGNIANFKPLHYDPVEQVAAILCS SGTTG 
RD1561H9 . SEQ E SLPN FISRYS DGNIANFKPL h[?]d PVEQVAAILCSSGTTG 



GRVER51 .SEQ LPKGVMQTHQNICVRLIHALDP R0G TQLIPGVTVLVYLPF 718 
LUCPPLYG. SEQ LPKGVMQTHQNICVRLI HALDPRAGTQLI PGVTVLVYLPF 718 
RD1561H9 . SEQ L P KG V M Q T H Q N I C V R L I H A L D P R0G TQLIPGVTVLVYLPF 718 



838 



GRVER51 . SEQ F H A F G F S I [t] L G Y F M V G L R V I M (7] R R F D Q E A F L K A I Q D Y E V R 
LUCPPLYG. SEQ F H A F G I N L G Y F M V G L R V I MJi R R F D Q E A F L K A I Q D Y E V R 838 

838 



»e ---- - ~ - — -.".uwj. i- 1.1 v ^ j_i x\ v i n jj i\ i\ : v u Cj n C b I\ ft i y U I tVK 

RD1561H9 . SEQ F HAFGf(h]i{t]lGYFMVGLRVI m|f|r RFDQEAFLKAIQDYEVR 

GRVER51.SEQ S V I N V P js v| I LFLSKS PLVDKYDLSSLRELCCGAAPLAKEV 
LUCPPLYG. SEQS V I N V P A I I LFLSKSPLVDKYDLSSLRELCCGAAPLAKEV 
RD1561H9 . SEQ S V I N V p |s vj l LFLSKSPLVDKYDLSSLRELCCGAAPLAKEV 

GRVER51 , SEQ A E V a[a]k RLNLPGIRCGFGLTESTSANIHS L [r] D EFKSGSLG 1U78 
LUCPPLYG. SEQ A EVAVKRLNLPGIRCGFGLTESTSANIHSLGDEFKSGSLG 1078 
RD1561H9 . SEQ A E V A0K RLNLPGIRCGFGLTESTS a[i1i [qt1 l GDEFKSGSLG 1078 



958 
958 
958 

1078 



GRVER51.SEQ 



M A A 


K 


IADRETGKALGPNQVGEL c[l]K G P M V S 


K G Y 


V N 


1198 


M A A 


K 


IADRETGKALGPNQVGELCVKGPMVSKGYVN 


1198 


M A A 


K 


IADRETGKALGPNQVGEL c[l]K G P M V S 


KGYVN 


1198 


K E A 


I 


DDDGWLHSGDFGYYDEDEHFYVVDRY 


K E L 


I K 


1318 


K E A 


I 


DDDGWLHSGD FG YYDEDEHFYVVDRY 


K E L 


I K 


1318 


K E A 


I 


DDDGWLHSGDFGYYDEDEHFYVVDRY 


K E L 


I K 


1318 


V A P 


A 


E L E E I L L K NPCIRDVAVVGIPDLEAG 


E L P 


S A 


1438 


V A P 


A 


ELEEILLKNPCIRDVAVVGIPDLEAG 


E L P 


S A 


1438 


V A P 


A 


ELEEILLKNPCIRDVAVVGIPDLEAG 


E L P 


S A 


1438 


P G K 


E 


ITAKEVYDYLAERVSHTKYLRGGVRF 


V D S 


I P 


1558 


P G K 


E 


ITAKEVYDYLAERVSHTKYLRGGVRF 


V D S 


I P 


1558 


p g(t] 


E 


ITAKEVYDYLAERVSHTKYLRGGVRF 


V D S 


I P 


1558 


KIT 


R 


KELLKQLLE k|a G g| 






1624 


KIT 


R 


KELLKQLLEKSSKL 






1627 


KIT 


R 


KELLKQL l[v!k|a G g| 
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GRver5.1 DNA sequence of pGL3 vectors 

ATGGTGAAACGCGAAAAGAACGTGATCTACGGCCCAGAACCACTGCATCC 50 
ACTGGAAGACCTCACCGCTGGTGAGATGCTCTTCCGAGCACTGCGTAAAC 100 
ATAGTCACCTCCCTCAAGCACTCGTGGACGTCGTGGGAGACGAGAgCCTC 150 
TCCTAC^y^GAATTTTTCGAAGCTACTGTGCTGTTGGCCCAAAGCCTCCA 200 
TAATTGTGGGTACAAAATGAACGATGTGGTGAGCATTTGTGCTGAGAATA 250 
ACACTCGCTTCTTTATTCCTrGTAATCGCTGCTTGGTACATCGGCATGATT 300 
GTCGCCCCTGTGAATGAATCTTACATCCCAGATGAGCTGTGTAAGGTTAT 350 
GGGTATTAGCAAACCTCAAATCGTCTTTACTACCAAAAACATCTTGAATA 400 
AGGTCTTGGAAGTCCAGTCTCGTACTAACTTCATCAAACGCATCATTATT 45 0 
CTGGATACCGTCGAAAACATCCACGGCTGTGAGAGCCTCCCTAACTTCAT 500 
CTCTCGTTACAGCGATGGTAATATCGCTAATTTCAAGCCOTTGCATTTTG 550 
ATCCAGTCGAGCAAGTGGCCGCTATTTTGTGCTCCTCCGGCACCACTGGT 600 
TTGCCTAAAGGTGTCATGCAGACTCACCAGAATATCTGTGTGCGTTTGAT 650 
CCACGCTCTCGACCCTCGTGTGGGTACTCAATTGATCcCTGGCGTGACTG 700 
TGCTGGTGTATCTGCCTTTCTTTCACGCCTTTGGTTTCTCTATTACCCTG 750 
GGCTATTTCATGGTCGGCTTGCGTGTCATCATGTTTCGTCGCTTCGACCA 8 0 (5 
AGAAGCCTTCTTGAAGGCTATTCAAGACTACGAGGTGCGTTCCGTGATCA 850 
ACGTCCCTTCAGTCATTTTGTTCCTGAGCAAATCTCCTTTGGTTGACAAG 900 
TATGATCTGAGCAGCTTGCGTGAGCTGTGCTGTGGCGCTGCTCCTTTGGC 950 
CAAAGAAGTGGCCGAGGTCGCTGCTAAGCGTCTGAACCTCCCTGGTATCC 1000 
GCTGCGGTTTTGGTTTGACTGAGAGCACTTCTGCTAACATCCATAGCTTG 1050 
CGAGACGAGTTTAAGTCTGGTAGCCTGGGTCGCGTGACTCCTCTTATGGC 110 0 
TGCAAAGATCGCCGACCGTGAGACCGGCAAAGCACTGGGCCCAAATCAAG 115 0 
TCGGTGAATTGTGTATTAAGGGCCCTATGGTCTCTAAAGGCTACGTGAAC 1200 
AATGTGGAGGCCACTAAAGAAGCCATTGATGATGATGGCTGGCTCC ATAG 1250 
CGGCGACTTCGGTTACTATGATGAGGACGAACACTTCTATGTGGTCGATC 13 0 0 
GCTAC AAAGAATTGATTAAGTACAAAGGCTCTCAAGTCGCACCAGCCGAA 1350 
CTGGAAGAAATTTTGCTGAAGAACCCTTGTATCCGCGACGTGGCCGTCGT 14 00 
GGGTATCCCAGACTTGGAAGCTGGCGAGTTGCCTAGCGCCTTTGTGGTGA 14 50 
AACAACCCGGCAAGGAGATCACTGCTAAGGAGGTCTACGACTATTTGGCC 1500 
GAGCGCGTGTCTCACACCAAATATCTGCGTGGCGGCGTCCGCTTCGTCGA 1550 
TTCTATTCCACGCAACGTTACCGGTAAGATCACTCGTAAAGAGTTGCTG A 1600 
AGCAACTCCTCGAAAAAGCTGGCGGC 162 6 
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RDver5.1 DNA sequence of pGL3 vectors 

ATGGTGAAGCGTGAGAAAAATGTCATCTATGGCCCTGAGCCTCTCCATCC 5 0 
TTTGGAGGATTTGACTGCCGGCGAAATGCTGTTTCGTGCTCTCCGCAAGC 100 
ACTCTc ATTTGCCTCAAGCCTTGGTCGATGTGGTCGGCGATGAATCTTTG 150 
AGCTACAAGGAGTTTTTTGAGGC AACCGTCTTGCTGGCTCAGTCCCTCC A 200 
CAATTGTGGCTACAAGATGAACGACGTCGTTAGTATCTGTGCTGAAAACA 250 
ATACCCGTTTCTTCATTCCAGTCATCGCCGCATGGTATATCGGTATGATC 300 
GTGGCTCCAGTCAACGAGAGCTACATTCCCGACGAACTGTGTAAAGTCAT 350 
GGGTATCTCTAAGCCACAGATTGTCTTCACCACTAAGAATATTCTGAACA 400 
AAGTCCTGGAAGTCCAAAGCCGCACCAACTTTATTAAGCGTATCATCATC 4 50 
TTGGACACTGTGGAGAATATTCACGGTTGCGAATCTTTGCCTAATTTCAT 500 
CTCTCGCTATTCAGACGGCAACATCGCAAACTTTAAACCACTCCACTTCG 550 
ACCCTGTGGAACAAGTTGCAGCCATTCTGTGTAGCAGCGGTACTACTGGA 600 
CTCCCAAAGGGAGTCATGCAGACCCATCAAAACATTTGCGTGCGTCTGAT 650 
CCATGCTCTCGATCCACGCTACGGCACTCAGCTGATTCCTGGTGTCACCG 700 
TCTTGGTCTACTTGCCTTTCTTCCATGCTTTCGGCTTTCATATTACTTTG 750 
GGTTACTTTATGGTCGGTCTCCGCGTGATTATGTTCCGCCGTTTTGATCA 800 
GGAGGCTTTCTTGAAAGCCATCCAAGATTATGAAGTCCGCAGTGTCATCA 850 
ACGTGCCTAGCGTGATCCTGTTTTTGTCTAAGAGCCCACTCGTGGAC AAG 900 
TACGACTTGTCTTCACTGCGTGAATTGTGTTGCGGTGCCGCTCCACTGGC 950 
TAAGGAGGTCGCTGAAGTGGCCGCCAAACGCTTGAATCTTCCAGGGATTC 1000 
GTTGTGGCTTCGGCCTCACCGAATCTACCAGCGCTATTATTCAGTCTCTC 1050 
CGCGATGAGTTTAAGAGCGGCTCTTTGGGCCGTGTCACTCCACTCATGGC 1100 
TGCTAAGATCGCTGATCGCGAAACTGGTAAGGCTTTGGGCCCGAACCAAG 1150 
TGGGCGAGCTGTGTATCAAAGGCCCTATGGTGAGCAAGGGTTATGTCAAT 1200 
AACGTTGAAGCTACCAAGGAGGCCATCGACGACGACGGCTGGTTGCATTC 1250 
TGGTGATTTTGGATATTACGACGAAGATGAGCATTTTTACGTCGTGGATC 13 00 
GTTAC AAGGAGCTGATCAAATACAAGGGTAGCCAGGTTGCTCCAGCTGAG 1350 
TTGGAGGAGATTCTGTTGAAAAATCCATGCATTCGCGATGTCGCTGTGGT 1400 
CGGCATTCCTGATCTGGAGGCCGGCGAACTGCCTTCTGCTTTCGTTGTCA 1450 
AGCAGCCTGGTAAAGAAATTACCGCCAAAGAAGTGTATGATTACCTGGCT 1500 
GAACGTGTGAGCCATACTAAGTACTTGCGTGGCGGCGTGCGTTTTGTTGA 1550 
CTCC ATCCCTCGTAACGTAAC AGGC AAAATTACCCGCAAGGAGCTGTTGA 1600 
AACAATTGTTGGAGAAGGCCGGCGGT 1626 
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RD1561H9 DNA sequence of pGL3 vectors 

ATGGTAAAGCGTGAGAAAAATGTCATCTATGGCCCTGAGCCTCTCCATCC 5 0 
TTTGGAGGATTTGACTGCCGGCGAAATGCTGTTTCGTGCTCTCCGCAAGC 100 
ACTCTCATTTGCCTCAAGCCTTGGTCGATGTGGTCGGCGATGAATCTTTG 150 
AGCTACAAGGAGTTTTTTGAGGCAACCGTCTTGCTGGCTCAGTCCCTCCA 200 
CAATTGTGGCTACAAGATGAACGACGTCGTTAGTATCTGTGCTGAAAACA 2 50 
ATACCCGTTTCTTCATTCCAGTCATCGCCGCATGGTATATCGGTATGATC 3 00 
GTGGCTCCAGTCAACGAGAGCTACATTCCCGACGAACTGTGTAAAGTCAT 350 
GGGTATCTCTAAGCCACAGATTGTCTTCACCACTAAGAATATTCTGAACA 400 
AAGTCCTGGAAGTCCAAAGCCGCACCAACTTTATTAAGCGTATCATCATC 450 
TTGGACACTGTGGAGAATATTCACGGTTGCGAATCTTTGCCTAATTTCAT 500 
CTCTCGCTATTCAGACGGCAACATCGCAAACTTTAAACCACTCCACTTCG 550 
ACCCTGTGGAACAAGTTGCAGCCATTCTGTGTAGCAGCGGTACTACTGGA 600 
CTCCCAAAGGGAGTCATGCAGACCCATCAAAACATTTGCGTGCGTCTGAT 650 
CCATGCTCTCGATCCACGCTACGGCACTCAGCTGATTCCTGGTGTCACCG 70 0 
TCTTGGTCTACTTGCCTTTCTTCCATGCTTTCGGCTTTCATATTACTTTG 750 
GGTTACTTTATGGTCGGTCTCCGCGTGATTATGTTCCGCCGTTTTGATCA 800 
GGAGGCTTTCTTGAAAGCCATCCAAGATTATGAAGTCCGCAGTGTCATCA 850 
ACGTGCCTAGCGTGATCCTGTTTTTGTCTAAGAGCCCACTCGTGGACAAG 900 
TACGACTTGTCTTCACTGCGTGAATTGTGTTGCGGTGCCGCTCCACTGGC 950 
TAAGGAGGTCGCTGAAGTGGCCGC CAAACGCTTGAATCTTCCAGGGATTC 1000 
GTTGTGGCTTCGGCCTCACCGAATCTACCAGTGCGATTATCCAGACTCTC 1050 
GGGGATGAGTTTAAGAGCGGCTCTTTGGGCCGTGTCACTCCACTCATGGC 1100 
TGCTAAGATCGCTGATCGCGAAACTGGTAAGGCTTTGGGCCCGAACCAA<3 115 0 
TGGGCGAGCTGTGTATCAAAGGCCCTATGGTGAGCAAGGGTTATGTCAAT 1200 
AACGTTGAAGCTACCAAGGAGGCCATCGACGACGACGGCTGGTTGCATTC 1250 
TGGTGATTTTGGATATTACGACGAAGATGAGCATTTTTACGTCGTGGATC 1300 
GTTACAAGGAGCTGATCAAATACAAGGGTAGCCAGGTTGCTCCAGCTGAG 1350 
TTGGAGGAGATTCTGTTGAAAAATCCATGCATTCGCGATGTCGCTGTGGT 1400 
CGGCATTCCTGATCTGGAGGCCGGCGAACTGCCTTCTGCTTTCGTTGTCA 1450 
AGCAGCCTGGTACAGAAATTACCGCCAAAGAAGTGTATGATTACCTGGCT 1500 
GAACGTGTGAGCCATACTAAGTACTTGCGTGGCGGCGTGCGTTTTGTTGA 1550 
CTCCATCCCTCGTAACGTAACAGGCAAAATTACCCGCAAGGAGCTGTTGA 1600 
AACAATTGTTGGTGAAGGCCGGCGGT 1626 
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GRver5.1 protein sequence of pGL3 vectors 

MVKREKNVIYGPEPLHPLEDLTAGEMLFRALRKHSHLPQALVDVVGDESL 50 
SYKEFFEATVLLAQSLHNCGYKMNDWSICAE^ 100 
VAPVNESYIPDELCKVMGISKPQIVFTTKNILNKVLEVQSRTNFIKRIII 150 
LDTVENIHGCESLPNFISRYSDGNIANFKPLHFDPVEQVAAILCSSGTTG 200 

LPKGVMQTHQNICTOLIHALDPRVGTQLIPGVTVLVYLPFFHAFGFSITL 250 <TIT/S TIN (1 ///* fflX 
GYFIWGLRVIMFRRFDQEAFLKAIQDYEVRSVINVPSVILFLSKSPLVDK 3 00 jt<X ^ V W\/'<^ 
YDLSSLRELCCGAAPLAKEVAEVAAKRLNIiPGIRCGFGLTESTSANIHSL 350 
RDEFKSGSLGRVTPLMAAKIADRETGKALGPNQVGELCIKGPMVSKGYVN 400 
NVEATKEAIDDDGWLHSGDFGYYDEDEHFYWDRYKELIKYKGSQVAPAE 450 
LEEILLKNPCIRDVAWGIPDLEAGELPSAFWKQPGKEITAKEVYDYLA 500 
ERVSHTKYLRGGVRFVDS I PRNVTG KI TRKELLKQLLEKAGG 542 



RDver5.1 protein sequence of pGL3 vectors 

MVKREKNVI YGPEPLHPLEDLTAGEMLFRALRKHSHLPQALVDWGDESL 5 0 
S YKEFFEATVLLAQSLHNCGYKMNDVVS ICAENNTRFFI PVI AAWYI GMI 100 
VAPVNES YI PDELCKVMGI SKPQI VFTTKHILNKVLEVQSRTNFIKRI 1 1 150 
LDTVENIHGCESLPNFISRYSDGNIANFKPLHFDPVEQVAAILCSSGTTG 200 

LPKGVMQTHQNICWLIHALDPRYGTQLIPGVTVLVYLPFFHAFGFHITL 250 /"-/^-pfN aU 1 .-V . 
GYFMVGLRVIMFRRFDQEAFLKAIQDYEVRSVINVPSVILFLSKSPLVDK 3 00 ^GIXQ3 W0 • ^Q(j 
YDLSSLRELCCGAAPLAKEVAEVAAKRLNLPGIRCGFGLTESTSAI IQSL 350 
RDEFKSGSLGRVTPLMAAKIADRETGKALGPNQVGELCIKGPMVSKGYVN 400 
NVEATKEAIDDDGWLHSGDFGYYDEDEHFYWDRYKELIKYKGSQVAPAE 450 
LEEILLKNPCIRDVAWGIPDLEAGELPSAFWKQPGKEITAKEVYDYLA 500 
ERVSHTKYLRGGVRF VD S I PRNVTGKI TRKELLKQLLEKAGG 542 



RD1561H9 protein sequence of pGL3 vectors 

MVKREKNVI YGPEPLHPLEDLTAGEMLFRALRKHSHLPQALVDVVGDESL 5 0 
S YKEFFEATVLLAQSLHNCGYKMNDVVS I CAENNTRFFI PVIAAWYIGMI 100 
VAPVNESYIPDELCKVMGISKPQIVFTTKNILNKVLEVQSRTNFIKRIII 150 

LDTVENIHGCESLPNFISRYSDGNIANFKPLHFDPVEQVAAILCSSGTTG 200 fcl'l - 1 7 V* 

LPKGVMQTHQNICVRLIHALDPRYGTQLIPGVTVLVYLPFFHAFGFHITL 250 ^^1 - DUO 

GYFMVGLRVIMFRRFDQEAFLKAIQDYEVRSVINVPSVILFLSKSPLVDK 300 
YDLSSLRELCCGAAPLAKEVAEVAAKRLNLPGIRCGFGLTESTSAI IQTL 350 
GDEFKSGSLGRVTPLMAAKIADRETGKALGPNQVGELCIKGPMVSKGYVN 400 
NVEATKEAIDDDGWLHSGDFGYYDEDEHFYVVDRYKELIKYKGSQVAPAE 450 
LEE I LLKNPC IRDVAWG I PDLEAGELPS AF WKQPGTE I TAKEVYDYLA 500 
ERVSHTKYLRGGVRFVDS I PRNVTGKI TRKELLKQLLVKAGG 542 
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SEQUENCE LISTING 

<110> Promega Corporation 
5 Wood, Keith V. 

Gruber, Monika G. 
Zhuang, Yao 
Paguio, Aileen 

10<120> Synthetic nucleic acid molecule compositions and methods of 
preparation 

<130> 341.005WO1 

15<150> US 09/645,706 
<151> 2000-08-24 

<160> 302 

20<170> FastSEQ for Windows Version 4.0 

<210> 1 
<211> 1629 
<212> DNA 
25<213> Pyrophorus plagiophthalamus 



<400> 1 



atgatgaaga 


gagagaaaaa 


tgttatatat 


ggacccgaac 


ccctacaccc 


cttggaagac 


60 


ttaacagcag 


gagaaatgct 


cttcagggcc 


cttcgaaaac 


attctcattt 


accgcaggct 


120 


3 0ttagtagatg 


tgtttggtga 


cgaatcgctt 


tcctataaag 


agttttttga 


agctacatgc 


180 


ctcctagcgc 


aaagtctcca 


caattgtgga 


tacaagatga 


atgatgtagt 


gtcgatctgc 


240 


gccgagaata 


ataaaagatt 


ttttattccc 


attattgcag 


cttggtatat 


tggtatgatt 


300 


gtagcacctg 


ttaatgaaag 


ttacatccca gatgaactct 


gtaaggtcat 


gggtatatcg 


360 


aaaccacaaa 


tagttttttg 


tacaaagaac 


attttaaata 


aggtattgga 


ggtacagagc 


420 


35agaactaatt 


tcataaaaag 


gatcatcata 


cttgatactg 


tagaaaacat 


acacggttgt 


480 


gaaagtcttc 


ccaattttat 


ttctcgttat 


tcggatggaa 


atattgccaa 


cttcaaacct 


540 


ttacattacg 


atcctgttga 


gcaagtggca 


gctatcttat 


gttcgtcagg 


cactactgga 


600 


ttaccgaaag 


gtgtaatgca 


aactcaccaa 


aatatttgtg 


tccgacttat 


acatgcttta 


660 


gaccccaggg 


caggaacgca 


acttattcct 


ggtgtgacag 


tcttagtata 


tctgcctttt 


720 


40ttccatgctt 


ttgggttctc 


tataaacttg ggatacttca 


tggtgggtct 


tcgtgttatc 


780 


atgttaagac 


gatttgatca 


agaagcattt ctaaaagcta 


ttcaggatta 


tgaagttcga 


840 
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agtgtaatta 


acgttccagc 


aataatattg ttcttatcga 


aaagtccttt 


ggttgacaaa 


900 


tacgatttat 


caagtttaag 


ggaattgtgt tgcggtgcgg 


caccattagc 


aaaagaagtt 


960 


gctgaggttg 


cagtaaaacg 


attaaacttg ccaggaattc 


gctgtggatt 


tggtttgaca 


1020 


gaatctactt 


cagctaatat 


acacagtctt ggggatgaat 


ttaaatcagg 


atcacttgga 


1080 


Sagagttactc 


ctttaatggc 


agctaaaata gcagataggg 


aaactggtaa 


agcattggga 


1140 


ccaaatcaag 


ttggtgaatt 


atgcgttaaa ggtcccatgg tatcgaaagg 


ttacgtgaac 


1200 


aatgtagaag 


ctaccaaaga 


agctattgat gatgatggtt 


ggcttcactc 


tggagacttt 


1260 


acrat aetata 


ataacraataa 


gcatttctat gtggtggacc 


gttacaagga 


attgattaaa 


1320 


tataagggct 


ctcaggtagc 


acctgeagaa ctagaagaga 


ttttattgaa 


aaatccatgt 


1380 


lOatcagagatg 


ttgctgtggt 


tggtattcct gatctagaag 


ctggagaact 


gccatctgcg 


1440 


tttgtggtta 


aacagcccgg 


aaaggagatt acagctaaag 


aagtgtacga 


ttatcttgee 


1500 


gagagggtct 


cccatacaaa 


gtatttgcgt ggaggggttc gattcgttga tagcatacca 


1560 


aggaatgtta 


caggtaaaat 


tacaagaaag gaacttctga 


ageagttget 


ggagaagagt 


1620 


tctaaactt 










1629 



15 

<210> 2 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 

20 

<220> 

<223> Sequence of clone YG#81-6G01 



<400> 2 



25atgatgaagc 


gagagaaaaa 


tgttatatat 


ggacccgaac 


ccctacaccc 


cttggaagac 


60 


ttaacagctg 


gagaaatget 


cttccgtgcc 


cttcgaaaac 


attctcattt 


accgcaggct 


120 


ttagtagatg 


tggttggcga 


egaatcgett 


tcctataaag 


agttttttga 


agegacagtc 


180 


ctcctagcgc 


aaagtctcca 


caattgtgga 


tacaagatga 


atgatgtagt 


gtcgatctgc 


240 


gecgagaata 


atacaagatt 


ttttattccc 


gttattgcag 


cttggtatat 


tggtatgatt 


300 


3 0gtagcacctg 


ttaatgaaag 


ttacatccca 


gatgaactct gtaaggtgat gggtatatcg 


360 


aaaccacaaa 


tagtttttac 


gacaaagaac 


attttaaata 


aggtattgga 


ggtacagagc 


420 


agaactaatt 


tcataaaaag 


gatcatcata 


cttgatactg 


tagaaaacat 


acacggttgt 


480 


gaaagtcttc 


ccaattttat 


ttctcgttat 


tcggatggaa 


atattgecaa 


cttcaaacct 


540 


ttacatttcg 


atcctgttga 


gcaagtggca 


gctatcttat 


gttegtcagg 


cactactgga 


600 


35ttaccgaaag 


gtgtaatgca 


aactcaccaa 


aatatttgtg 


tccgacttat 


acatgettta 


660 


gaccccaggg 


caggaacgea 


acttattcct 


ggtgtgacag 


tcttagtata 


tetgectttt 


720 


ttccatgett 


ttgggttctc 


tataaccttg 


ggatacttca 


tggtgggtct 


tcgtgttatc 


780 


atgttcagac 


gatttgatca 


agaagcattt 


ctaaaagcta 


ttcaggatta 


tgaagttcga 


840 


agtgtaatta 


acgttccatc 


agtaatattg 


ttcttatcga 


aaagtccttt 


ggttgacaaa 


900 


40tacgatttat 


caagtttaag 


ggaattgtgt 


tgcggtgcgg 


caccattagc 


aaaagaagtt 


960 
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gctgaggttg 


cagcaaaacg 


attaaacttg 


ccaggaattc 


gctgtggatt 


tggtttgaca 


1020 


gaatctactt 


cagctaatat 


acacagtctt 


agggatgaat 


ttaaatcagg 


atcacttgga 


1080 


agagttactc 


ctttaatggc 


agctaaaata gcagataggg aaactggtaa 


agcattggga 


1140 


ccaaatcaag 


ttggtgaatt 


atgcattaaa 


ggtcccatgg 


tatcgaaagg 


ttacgtgaac 


1200 


5aatgtagaag 


ctaccaaaga 


agctattgat 


gatgatggtt 


ggcttcactc 


tggagacttt 


1260 


ggatactatg 


atgaggatga 


gcatttctat 


gtggtggacc 


gttacaagga 


attgattaaa 


1320 


tataagggct 


ctcaggtagc 


acctgcagaa 


ctagaagaga 


ttttattgaa 


aaatccatgt 


1380 


atcagagatg 


ttgctgtggt 


tggtattcct 


gatctagaag 


ctggagaact 


gccatctgcg 


1440 


tttgtggtta 


aacagcccgg 


aaaggagatt 


acagctaaag 


aagtgtacga 


ttatcttgcc 


1500 


lOgagagggtct 


cccatacaaa 


gtatttgcgt 


ggaggggttc 


gattcgttga 


tagcatacca 


1560 


aggaatgtta 


caggtaaaat 


tacaagaaag gaacttctga 


agcagttgct 


ggagaaggcg 


1620 


ggaggt 












1626 



<210> 3 

15<211> 1626 

<212> DNA 

<213> Artificial Sequence 
<220> 

20<223> Sequence of a synthetic luciferase 



<400> 3 



atgatgaaac 


gcgaaaagaa 


cgtcatctac 


ggcccagagc 


ctctgcaccc 


attggaagac 


60 


ctgaccgccg 


gtgagatgtt 


gttccgtgct 


ctgcgtaaac 


attctcactt 


gcctcaagcc 


120 


2 5ctggtggatg 


tcgtgggcga 


cgaaagcttg 


tcttataagg 


agtttttcga 


agctactgtc 


180 


ctgttggccc 


agtctctgca 


taattgcggt 


tacaaaatga 


acgatgtggt 


cagcatttgt 


240 


gctgagaata 


acacccgctt 


tttcatccca 


gtgattgccg 


cttggtacat 


cggcatgatt 


300 


gtcgcccctg 


tgaatgaatc 


ttatatccca 


gacgagttgt 


gcaaggtcat 


gggtattagc 


360 


aaacctcaaa 


tcgtgtttac 


taccaagaac 


attctgaata 


aagtcttgga 


agtgcagtct 


420 


3 0cgtactaact 


tcatcaagcg 


cattatcatt 


ctggataccg 


tcgagaatat 


ccacggctgt 


480 


gaaagcttgc 


caaactttat 


ttctcgttat 


agcgacggta 


atatcgctaa 


cttcaagcct 


540 


ctgcattttg 


atccagtgga 


gcaagtcgcc 


gctattttgt 


gctctagcgg 


cactaccggt 


600 


ctgcctaaag 


gcgtgatgca 


gactcaccaa 


aatatctgtg 


tccgcttgat 


tcatgccctg 


660 


gacccacgtg 


tgggtaccca 


gttgatccct 


ggcgtgactg 


tcctggtgta 


cttgccattc 


720 


35tttcacgcct 


tcggtttttc 


tattaccctg 


ggctatttca 


tggtcggttt 


gcgcgtgatc 


780 


atgtttcgtc 


gcttcgatca 


agaagctttt 


ctgaaggcca 


ttcaggacta 


cgaggtccgt 


840 


agcgtgatca 


acgtcccttc 


tgtgattttg 


ttcctgagca 


aatctccatt 


ggtcgataag 


900 


tatgacctga 


gctctttgcg 


cgaactgtgc 


tgtggcgctg 


cccctttggc 


taaagaggtg 


960 


gccgaagtcg 


ctgccaagcg 


tctgaatttg 


ccaggtatcc 


gctgcggctt 


tggtctgact 


1020 


40gagagcacct 


ctgctaacat 


tcatagcttg 


cgtgatgaat 


tcaaatctgg cagcctgggt 


1080 
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cgcgtgactc ctttgatggc cgctaagatc gccgaccgtg agaccggcaa agctctgggt 
ccaaatcaag tcggcgaatt gtgtattaag ggtcctatgg tgtctaaagg ctacgtcaac 
aatgtggagg ccactaagga agctatcgat gacgatggtt ggctgcacag cggcgacttt 
ggttattacg atgaggacga acatttctat gtcgtggatc gctacaaaga gttgattaag 



1140 



1200 



1260 



1320 



Stataaaggct ctcaggtcgc cccagctgag ctggaagaga tcttgctgaa gaacccttgc 
attcgtgacg tggccgtcgt gggtatccca gatttggaag ctggcgagct gcctagcgcc 
tttgtcgtga aacaaccagg taaggaaatt accgctaaag aggtctacga ctatttggcc 
gaacgcgtgt ctcacactaa gtacctgcgt ggcggtgtcc gcttcgtgga tagcatccct 
cgcaatgtca ccggcaaaat tactcgtaag gagttgctga aacagttgct ggaaaaggct 



1380 



1440 



1500 



1560 



1620 



loggtggc 



1626 



<210> 4 
<211> 1626 
<212> DNA 
15<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic luciferase 



20<400> 4 



atgatgaaac 


gcgaaaagaa 


cgtcatctac 


ggcccagagc 


ctctgcaccc 


attggaagac 


60 


ctgaccgctg 


gtgagatgtt 


gttccgtgct 


ctgcgtaaac 


attctcactt 


gcctcaagcc 


120 


ctggtcgatg 


tcgtgggcga 


cgagagcttg 


tcttataagg 


aatttttcga 


agctactgtc 


180 


ctgttggccc 


aatctctgca 


taattgcggt 


tacaaaatga 


acgatgtggt 


cagcatttgt 


240 


25gctgagaata 


acacccgctt 


tttcatccca 


gtgattgccg 


cttggtacat 


cggcatgatt 


300 


gtcgcccctg 


tgaatgaatc 


ttatatccca 


gacgagttgt 


gcaaggtcat 


gggtattagc 


360 


aaacctcaaa 


tcgtgtttac 


taccaagaac 


attctgaata 


aggtcttgga 


agtgcagtct 


420 


cgtactaact 


tcatcaagcg 


cattatcatt 


ctggataccg 


tcgagaatat 


ccacggctgt 


480 


gagagcttgc 


caaactttat 


ttctcgttat 


agcgacggta 


atatcgctaa 


cttcaagcct 


540 


30ctgcattttg 


atccagtgga 


gcaagtcgcc 


gctattttgt 


gctctagcgg 


caccaccggt 


600 


ctgcctaaag 


gcgtgatgca 


gactcaccaa 


aatatctgtg 


tccgcttgat 


tcatgccctg 


660 


gacccacgtg 


tgggtactca 


gttgatccct 


ggcgtgactg 


tcctggtgta 


cttgccattc 


720 


tttcacgcct 


tcggtttttc 


tattaccctg 


ggctatttca 


tggtcggttt 


gcgcgtgatc 


780 


atgtttcgtc 


gcttcgatca 


agaagccttt 


ctgaaggcca ttcaagacta 


cgaggtccgt 


840 


35agcgtgatca 


acgtcccttc 


tgtgattttg 


ttcctgagca 


aatctccatt 


ggtcgataag 


900 


tatgacctga 


gcagcttgcg 


cgaactgtgc 


tgtggcgctg 


cccctttggc 


taaagaggtg 


960 


gccgaagtcg 


ctgccaagcg 


tctgaatttg 


ccaggtatcc 


gctgcggctt 


tggtctgact 


1020 


gagagcacct 


ctgctaacat 


tcatagcttg 


cgtgatgagt 


tcaaatctgg 


cagcctgggt 


1080 


cgcgtgactc 


ctttgatggc 


cgctaagatc 


gccgaccgtg 


agaccggcaa 


agctctgggt 


1140 


40ccaaatcaag 


tcggcgaatt 


gtgtattaag 


ggtcctatgg tgtctaaagg ctacgtcaac 


1200 
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aatgtggagg ccactaagga agctattgat gacgatggtt ggctgcacag cggcgacttt 1260 

ggttattacg atgaggacga acatttctat gtcgtcgatc gctacaaaga gttgattaag 1320 

tataaaggct ctcaagtcgc cccagctgag ctggaagaaa tcttgctgaa gaacccttgc 1380 

attcgtgacg tggccgtcgt gggtatccca gatttggaag ctggcgagct gcctagcgcc 1440 

Stttgtcgtga aacaaccagg caaggaaatt accgctaaag aggtctacga ctatttggcc 1500 

gagcgcgtgt ctcacactaa gtacctgcgt ggcggtgtcc gcttcgtcga tagcatccct 1560 

cgcaatgtca ccggcaaaat tactcgtaag gagttgctga aacagttgct ggaaaaggct 1620 

ggtggc 1626 



10<210> 5 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 



15<220> 

<223> Sequence of a synthetic luciferase 



<400> 5 



atgatgaaac 


gcgaaaagaa 


cgtgatctac 


ggcccagaac 


cactgcatcc 


actggaagac 


60 


20ctcaccgctg 


gtgagatgct 


gttccgtgcc 


ctgcgtaaac 


atagccacct 


gcctcaagct 


120 


ctcgtggacg 


tcgtgggtga 


cgagagcctg 


tcttacaaag 


aatttttcga 


agctactgtg 


180 


ctgttggccc 


aaagcctgca 


taattgtggt 


tacaaaatga 


acgatgtggt 


gagcatctgt 


240 


gctgagaata 


acactcgctt 


ttttatccct 


gtgatcgctg 


cttggtacat 


cggcatgatt 


300 


gtcgcccctg 


tgaatgaatc 


ttacatccca 


gatgagttgt 


gtaaggtgat 


gggtattagc 


360 


25aaacctcaaa 


tcgtctttac 


taccaaaaac 


atcctgaata 


aggtcttgga 


agtccagtct 


420 


cgtactaatt 


tcatcaaacg 


cattattatt 


ctggataccg 


tcgaaaacat 


ccacggctgt 


480 


gagagcttgc 


ctaactttat 


ctctcgttac 


agcgatggta 


atatcgctaa 


tttcaagcca 


540 


ctgcattttg 


atccagtcga 


gcaggtcgcc 


gccattttgt 


gctcttctgg 


caccactggt 


600 


ttgcctaaag 


gtgtcatgca 


gactcaccag 


aatatctgtg 


tgcgcttgat 


ccacgccctc 


660 


30gaccctcgtg 


tgggtactca 


attgatccct 


ggcgtgactg 


tgctggtgta 


tttgcctttc 


720 


tttcacgcct 


ttggtttttc 


tatcaccctg 


ggctatttca 


tggtcggctt 


gcgtgtgatc 


780 


atgtttcgtc 


gcttcgacca 


agaagccttc 


ctgaaggcta 


ttcaagacta 


cgaggtgcgt 


840 


tctgtgatca 


atgtcccatc 


tgtcattttg 


ttcctgagca 


aatctccttt 


ggttgacaag 


900 


tatgatctga 


gcagcttgcg 


tgaactgtgc 


tgtggcgctg 


ctcctttggc 


caaagaagtg 


960 


35gccgaggtcg 


ctgctaagcg 


tctgaacctc 


cctggtatcc 


gctgcggttt 


tggtttgact 


1020 


gagagcactt 


ctgccaacat 


ccatagcttg 


cgtgacgagt 


ttaaatctgg 


tagcctgggt 


1080 


cgcgtgaccc 


ctttgatggc 


tgcaaagatc 


gccgaccgtg 


agaccggcaa 


agccctgggc 


1140 


ccaaatcagg 


tcggtgaatt 


gtgcattaag 


ggccctatgg 


tctctaaagg 


ctacgtgaac 


1200 


aatgtggagg 


ccactaaaga 


agctattgat 


gatgatggtt 


ggttgcatag 


cggcgacttc 


1260 


40ggttattatg 


atgaggacga 


acacttctat 


gtggtcgatc gctataaaga attgattaag 


1320 
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tacaaaggct ctcaagtcgc cccagctgaa ctggaagaaa ttttgctgaa gaacccttgt 



1380 



attcgcgacg tggccgtcgt gggtatccca gacttggaag ctggcgagtt gcctagcgcc 
tttgtggtga aacaacctgg caaggagatt actgctaagg aggtctacga ctatttggcc 
gagcgcgtgt ctcacactaa atatctgcgt ggcggcgtcc gcttcgtcga ttctatccct 
Scgcaacgtca ccggcaagat cactcgtaaa gagttgctga aacaattgct cgaaaaagct 



1440 



1500 



1560 



1620 



ggcggc 



1626 



<210> 6 
<211> 1626 
10<212> DNA 

<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic lucif erase 

15 



<400> 6 



a t* a ri t" era a a c 


gcgaaaagaa 


cgtgatctac 


ggcccagaac 


cactgcatcc 


actggaagac 


60 


ctcaccgctg 


gtgagatget 


cttccgtgca 


ctgegtaaac 


atagtcacct 


ccctcaagct 




ctcgtggacg 


tcgtgggaga 


cgagagcctc 


tcttacaaag 


aatttttcga 


agctactgtg 


180 


20ctgttggccc 


aaagcctcca 


taattgtgga 


tacaaaatga 


acgatgtggt 


gagcatttgt 


240 


gctgagaata 


acactcgctt 


ctttatccct 


gttatcgctg 


cttggtacat 


eggcatgatt 


300 


gtcgcccctg 


tgaatgaatc 


ttacatccca 


gatgagctgt 


gtaaggttat 


gggtattagc 


360 


aaacctcaaa 


tegtctttae 


taccaaaaat 


atcctgaata 


aggtcttgga 


agtccagtct 


420 


cgtactaact 


tcatcaaacg 


catcattatt 


ctggataccg 


tcgaaaacat 


ccatggctgt 


480 


25gagagcctgc 


ctaacttcat 


ctctcgttac 


agcgatggta 


atategctaa 


tttcaaacca 


540 


ctgcattttg 


atccagtcga 


gcaagtggcc 


gctattttgt 


gctcttccgg 


caccactggt 


600 


ttgcctaaag 


gtgtcatgca 


gactcaccag 


aatatctgtg 


tgcgtttgat 


ccacgctctc 


660 


gaccctcgtg 


tgggtactca 


attgatccct 


ggcgtgactg 


tgctggtgta 


tctgcctttc 


720 


tttcacgcct 


ttggtttttc 


tattaccctg 


ggctatttca 


tggteggett 


gcgtgtcatc 


780 


30atgtttcgtc 


gcttcgacca 


agaagectte 


ttgaaggcta 


ttcaagacta 


cgaggtgcgt 


840 


tctgtcatca 


atgtcccttc 


agtcattttg 


ttcctgagca 


aatctccttt 


ggttgacaag 


900 


tatgatctga 


geagcttgeg 


tgagctgtgc 


tgtggcgctg 


ctcctttggc 


caaagaagtg 


960 


gecgaggteg 


ctgctaagcg 


tctgaacctc 


cctggtatcc 


gctgcggttt 


tggtttgact 


1020 


gagagcactt 


ctgetaacat 


ccatagcttg 


cgagacgagt 


ttaagtctgg 


tagcctgggt 


1080 


35cgcgtgactc 


ctcttatggc 


tgeaaagate 


gccgaccgtg 


agaceggcaa 


agcactgggc 


1140 


ccaaatcaag 


teggtgaatt 


gtgtattaag 


ggccctatgg 


tctctaaagg 


etaegtgaac 


1200 


aatgtggagg 


ccactaaaga 


agecattgat 


gatgatggct 


ggctccatag 


cggcgacttc 


1260 


ggttactatg 


atgaggacga 


acacttctat 


gtggtcgatc 


gctacaaaga 


attgattaag 


1320 


tacaaaggct 


ctcaagtcgc 


cccagccgaa 


ctggaagaaa ttttgctgaa gaacccttgt 


1380 


40atccgcgacg 


tggccgtcgt 


gggtatccca 


gacttggaag ctggtgagtt gcctagcgcc 


1440 



WO 02/16944 
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7 

tttgtggtga aacaacctgg aaaggagatc actgctaagg aggtctacga ctatttggcc 1500 
gagcgcgtgt ctcacaccaa atatctgcgt ggcggcgtcc gcttcgtcga ttccatccca 1560 
cgcaacgtga ccggtaagat cactcgtaaa gaattgctga agcaactcct cgaaaaagct 1620 
ggcggc 1626 

5 

<210> 7 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 

0 

<220> 

<223> Sequence of a synthetic lucif erase 



<400> 7 



IBatgatgaaac 


gcgaaaagaa 


cgtgatctac 


ggcccagaac 


cactgcatcc 


actggaagac 


60 


ctcaccgctg 


gtgagatgct 


cttccgagca 


ctgcgtaaac 


atagtcacct 


ccctcaagca 


12 0 


ctcgtggacg 


tcgtgggaga 


cgagagcctc 


tcctacaaag 


aatttttcga 


agctactgtg 


180 


ctgttggccc 


aaagcctcca 


taattgtggg 


tacaaaatga 


acgatgtggt 


gagcatttgt 


240 


gctgagaata 


acactcgctt 


ctttattcct 


gtaatcgctg 


cttggtacat 


cggcatgatt 


300 


2 OcrtccrcccctQ 


tgaatgaatc 


ttacatccca 


gatgagctgt 


gtaaggttat 


gggtattagc 


360 


aaacctcaaa 


tcgtctttac 


taccaaaaac 


atcttgaata 


aggtcttgga 


agtccagtct 


420 


cgtactaact 


tcatcaaacg 


catcattatt 


ctggataccg 


tcgaaaacat 


ccacggctgt 


480 


gagagcctcc 


ctaacttcat 


ctctcgttac 


agcgatggta 


atatcgctaa 


tttcaagccc 


540 


ttgcattttg 


atccagtcga 


gcaagtggcc 


gctattttgt 


gctcctccgg 


caccactggt 


600 


25ttgcctaaag 


gtgtcatgca 


gactcaccag 


aatatctgtg 


tgcgtttgat 


ccacgctctc 


660 


gaccctcgtg 


tgggtactca 


attgatccct 


ggcgtgactg 


tgctggtgta 


tctgcctttc 


720 


tttcacgcct 


ttggtttctc 


tattaccctg 


ggctatttca 


tggtcggctt 


gcgtgtcatc 


780 


atgtttcgtc 


gcttcgacca 


agaagccttc 


ttgaaggcta 


ttcaagacta 


cgaggtgcgt 


840 


tccgtgatca 


acgtcccttc 


agtcattttg 


ttcctgagca" :aatctccttt 


ggttgacaag 


900 


30tatgatctga 


gcagcttgcg 


tgagctgtgc 


tgtggcgctg 


ctcctttggc 


caaagaagtg 


960 


gccgaggtcg 


ctgctaagcg 


tctgaacctc 


cctggtatcc 


gctgcggttt 


tggtttgact 


1020 


gagagcactt 


ctgctaacat 


ccatagcttg 


cgagacgagt 


ttaagtctgg 


tagcctgggt 


1080 


cgcgtgactc 


ctcttatggc 


tgcaaagatc 


gccgaccgtg 


agaccggcaa 


agcactgggc 


1140 


ccaaatcaag 


tcggtgaatt 


gtgtattaag 


ggccctatgg 


tctctaaagg 


ctacgtgaac 


1200 


35aatgtggagg 


ccactaaaga 


agccattgat 


gatgatggct 


ggctccatag 


cggcgacttc 


1260 


ggttactatg 


atgaggacga 


acacttctat 


gtggtcgatc 


gctacaaaga 


attgattaag 


1320 


tacaaaggct 


ctcaagtcgc 


accagccgaa 


ctggaagaaa 


ttttgctgaa 


gaacccttgt 


1380 


atccgcgacg 


tggccgtcgt 


gggtatccca 


gacttggaag ctggcgagtt gcctagcgcc 


1440 


tttgtggtga 


aacaacccgg 


caaggagatc 


actgctaagg aggtctacga ctatttggcc 


1500 


40gagcgcgtgt 


ctcacaccaa 


atatctgcgt 


ggcggcgtcc 


gcttcgtcga 


ttctattcca 


1560 
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cgcaacgtta ccggtaagat cactcgtaaa gagttgctga agcaactcct cgaaaaagct 1620 
ggcggc 1626 

<210> 8 
5<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> Sequence of a synthetic luciferase 



<400> 8 



atgatgaaac 


gcgaaaagaa 


cgtgatctac 


ggcccagaac 


cactgcatcc 


actggaagac 


60 


ctcaccgctg 


gtgagatgct 


cttccgagca 


ctgcgtaaac 


atagtcacct 


ccctcaagca 


120 


ISctcgtggacg 


tcgtgggaga 


cgagaacctc 


tcctacaaag 


aatttttcga 


agctactgtg 


180 


ctgttggccc 


aaagcctcca 


taattgtggg 


tacaaaatga 


acgatgtggt 


gagcatttgt 


240 


gctgagaata 


acactcgctt 


ctttattcct 


gtaatcgctg 


cttggtacat 


cggcatgatt 


300 


gtcgcccctg 


tgaatgaatc 


ttacatccca 


gatgagctgt 


gtaaggttat 


gggtattagc 


360 


aaacctcaaa 


tcgtctttac 


taccaaaaac 


atcttgaata aggtcttgga 


agtccagtct 


420 


20cgtactaact 


tcatcaaacg 


catcattatt 


ctggataccg 


tcgaaaacat 


ccacggctgt 


480 


gagagcctcc 


ctaacttcat 


ctctcgttac 


agcgatggta 


atatcgctaa 


tttcaagccc 


540 


ttgcattttg 


atccagtcga 


gcaagtggcc 


gctattttgt 


gctcctccgg 


caccactggt 


600 


ttgcctaaag 


gtgtcatgca 


gactcaccag 


aatatctgtg tgcgtttgat 


ccacgctctc 


660 


gaccctcgtg 


tgggtactca 


attgatctct 


ggcgtgactg 


tgctggtgta 


tctgcctttc 


720 


25tttcacgcct 


ttggtttctc 


tattaccctg 


ggctatttca 


tggtcggctt 


gcgtgtcatc 


780 


atgtttcgtc 


gcttcgacca 


agaagccttc 


ttgaaggcta 


ttcaagacta 


cgaggtgcgt 


840 


tccgtgatca 


acgtcccttc 


agtcattttg 


ttcctgagca 


aatctccttt 


ggttgacaag 


900 


tatgatctga 


gcagcttgcg 


tgagctgtgc 


tgtggcgctg 


ctcctttggc 


caaagaagtg 


960 


gccgaggtcg 


ctgctaagcg 


tctgaacctc 


cctggtatcc gctgcggttt 


tggtttgact 


1020 


3 0gagagcactt 


ctgctaacat 


ccatagcttg 


cgagacgagt 


ttaagtctgg 


tagcctgggt 


1080 


cgcgtgactc 


ctcttatggc 


tgcaaagatc 


gccgaccgtg agaccggcaa 


agcactgggc 


1140 


ccaaatcaag 


tcggtgaatt 


gtgtattaag 


ggccctatgg 


tctctaaagg 


ctacgtgaac 


1200 


aatgtggagg 


ccactaaaga 


agccattgat 


gatgatggct 


ggctccatag 


cggcgacttc 


1260 


ggttactatg 


atgaggacga 


acacttctat 


gtggtcgatc gctacaaaga attgattaag 


1320 


35tacaaaggct 


ctcaagtcgc 


accagccgaa 


ctggaagaaa 


ttttgctgaa 


gaacccttgt 


1380 


atccgcgacg 


tggccgtcgt 


gggtatccca 


gacttggaag 


ctggcgagtt gcctagcgcc 


1440 


tttgtggtga 


aacaacccgg 


caaggagatc 


actgctaagg 


aggtctacga 


ctatttggcc 


1500 


gagcgcgtgt 


ctcacaccaa 


atatctgcgt 


ggcggcgtcc 


gcttcgtcga 


ttctattcca 


1560 


cgcaacgtta 


ccggtaagat 


cactcgtaaa 


gagttgctga agcaactcct 


cgaaaaagct 


1620 


40ggcggc 












1626 
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<210> 9 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> Sequence of a synthetic lucif erase 



<400> 9 



lOatgatgaaac 


gcgaaaagaa 


cgtgatctac 


ggcccagaac 


cactgcatcc 


actggaagac 


60 


ctcaccgctg 


gtgagatgct 


cttccgagca 


ctgcgtaaac 


atagtcacct 


ccctcaagca 


120 


ctcgtggacg 


tcgtgggaga 


cgagagcctc 


tcctacaaag 


aatttttcga 


agctactgtg 


180 


ctgttggccc 


aaagcctcca 


taattgtggg 


tacaaaatga 


acgatgtggt 


gagcatttgt 


240 


gctgagaata 


acactcgctt 


ctttattcct 


gtaatcgctg 


cttggtacat 


cggcatgatt 


300 


15gtcgcccctg 


tgaatgaatc 


ttacatccca 


gatgagctgt 


gtaaggttat 


gggtattagc 


360 


aaacctcaaa 


tcgtctttac 


taccaaaaac 


atcttgaata 


aggtcttgga agtccagtct 


420 


cgtactaact 


tcatcaaacg 


catcattatt 


ctggataccg 


tcgaaaacat 


ccacggctgt 


4B0 


gagagcctcc 


ctaacttcat 


ctctcgttac 


agcgatggta 


atatcgctaa 


tttcaagccc 


540 


ttgcattttg 


atccagtcga 


gcaagtggcc 


gctattttgt 


gctcctccgg 


caccactggt 


600 


2 0ttgcctaaag 


gtgtcatgca 


gactcaccag 


aatatctgtg 


tgcgtttgat 


ccacgctctc 


660 


gaccctcgtg 


tgggtactca 


attgatccct 


ggcgtgactg 


tgctggtgta 


tctgcctttc 


720 


tttcacgcct 


ttggtttctc 


tattaccctg 


ggctatttca 


tggtcggctt gcgtgtcatc 


780 


atgtttcgtc 


gcttcgacca 


agaagccttc 


ttgaaggcta 


ttcaagacta 


cgaggtgcgt 


840 


tccgtgatca 


acgtcccttc 


agtcattttg 


ttcctgagca 


aatctccttt 


ggttgacaag 


900 


25tatgatctga 


gcagcttgcg 


tgagctgtgc 


tgtggcgctg 


ctcctttggc 


caaagaagtg 


960 


gccgaggtcg 


ctgctaagcg 


tctgaacctc 


cctggtatcc 


gctgcggttt 


tggtttgact 


1020 


gagagcactt 


ctgctaacat 


ccatagcttg 


cgagacgagt 


ttaagtctgg 


tagcctgggt 


1080 


cgcgtgactc 


ctcttatggc 


tgcaaagatc 


gccgaccgtg 


agaccggcaa 


agcactgggc 


1140 


ccaaatcaag 


tcggtgaatt 


gtgtattaag 


ggccctatgg 


tctctaaagg 


ctacgtgaac 


1200 


30aatgtggagg 


ccactaaaga 


agccattgat 


gatgatggct 


ggctccatag 


cggcgacttc 


1260 


ggttactatg 


atgaggacga 


acacttctat 


gtggtcgatc 


gctacaaaga 


attgattaag 


1320 


tacaaaggct 


ctcaagtcgc 


accagccgaa 


ctggaagaaa 


ttttgctgaa 


gaacccttgt 


1380 


atccgcgacg 


tggccgtcgt 


gggtatccca 


gacttggaag 


ctggcgagtt 


gcctagcgcc 


1440 


tttgtggtga 


aacaacccgg 


caaggagatc 


actgctaagg 


aggtctacga ctatttggcc 


1500 


35gagcgcgtgt 


ctcacaccaa 


atatctgcgt 


ggcggcgtcc 


gcttcgtcga 


ttctattcca 


1560 


cgcaacgtta 


ccggtaagat 


cactcgtaaa 


gagttgctga 


agcaactcct 


cgaaaaagct 


1620 


ggcggc 












1626 



<210> 10 
40<211> 1626 
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<212> DNA 

<213> Artificial Sequence 
<220> 

5<22 3> Sequence of a synthetic lucif erase 



<400> 10 



atgatgaagc 


crtcracraaaaa 


tgtgatttat 


ggtcctgaac 


cattgcatcc 


tctqqaqgat 

ZJ ZJ ZJ ZJ 


60 


ttgactgctg 


qcqaaatqct 


qtttcqcqcc 


ttgcgcaagc 


acagccatct 


gccacaggct 


120 


lOttggtcgacg 


tcrcrtccratcia 


tgagtctctg 


agctacaaag 


aattctttga 


qqccaccqtq 


180 


ttgctggctc 


aaagcttgca 


caactqtqqc 


tataagatga atgacgtcgt 


gtctatctgc 


240 


gccgaaaaca 


atactcgttt 


ctttattcct 


gtcatcgctg cctggtatat 


tggtatgatc 


300 


gtggc tccag 


tcaacgagag 


ctacattcct 


gatgaactgt 


gtaaagtgat 


gggcatctct 


360 


aagccacaga 


ttgtcttcac 


cactaaaaat 


atcttgaaca 


aggtgctgga 


qqtccaaaqc 

ZJ zj --J 


420 


15cgcaccaatt 


ttattaaacg 


tatcattatc 


ttggacactg 


tggaaaacat 


tcatggttgc 


480 


gagtctctgc 


ctaatttcat 


cagccgctac 


tctgatggca 


acattgccaa 


ttttaaacca 


540 


ttgcacttcg 


accctgtcga 


a c a q q t cr q c t 


gccatcctgt 


gtagctctgg 


taccactggc 


600 


ttgccaaagg 


gtgtcatgca 


aacccatcag 


aacatttgcg 


tgcgtctgat 


ccacgctctc 


660 


gatcctcgct 


acggcactca 


actgattcca 


ggtgtcaccg tgttggtcta 


tctgcctttt 


720 


2 0ttccatgctt 


ttggcttcca 


catcactttg 


ggttacttta 


tggtgggcct 


gcgtgtcatt 


780 


atgttccgcc 


gttttgacca 


qgaggccttc 

Zj ZJ ZJ ZJ 


ttgaaagcta 


tccaagatta 


tgaagtgcgc 


840 




a. L.y uy c Lciciy 


rifi ♦* n 3 r 1 /*"■< ^~ r~r 

cyLLaLuu uy 


tttttgtcta agagccctct 


y —A i_ y y cl a a a 


900 


tacgatttgt 


ctagcctgcg 


tgagttgtgt 


tgcggtgccg 


ctccactggc 


caaggaagtc 


960 


gctgaggtgg 


ccgctaaacg 


cttgaacctg 


cctggcattc 


gttgtggttt 


cggcttgacc 


1020 


25gaatctacta 


gcgccattat 


ccaatctctg 


cgcgacgagt 


ttaagagcgg 


ttctttgggc 


1080 


cgtgtcaccc 


cactgatggc 


tgccaaaatt 


gctgatcgcg aaactggtaa 


ggccttgggc 


1140 


cctaaccagg 


tgggtgagct 


gtgcatcaaa 


ggcccaatgg 


tcagcaaggg 


ttatgtgaat 


1200 


aacgtcgaag 


ctaccaaaga 


ggccattgac 


gatgacggct 


ggttgcattc 


tggtgatttc 


1260 


ggctactatg 


acgaagatga 


gcacttttac 


gtggtcgacc gttataagga 


actgatcaaa 


1320 


3 0tacaagggta 


gccaagtggc 


tcctgccgaa 


ttggaggaaa 


ttctgttgaa 


aaatccatgt 


1380 


atccgcgatg 


tcgctgtggt 


cggcattcct 


gacctggagg 


ccggtgaatt 


gccatctgct 


1440 


ttcgtggtca 


agcagcctgg 


caaagagatc 


actgccaagg 


aagtgtatga 


ttacctggct 


1500 


gagcgtgtca 


gccataccaa 


atatttgcgc 


ggtggcgtgc 


gttttgtcga 


ctctattcca 


1560 


cgtaacgtga 


ctggtaagat 


cacccgcaaa 


gaactgttga 


agcaactgtt 


ggagaaagcc 


1620 


35ggcggt 












1626 



<210> 11 
<211> 1626 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> Sequence of a synthetic luciferase 



<400> 11 














Satgatgaagc 


gtgagaaaaa 


tgtgatttat 


ggtcctgaac 


cattgcatcc 


tctggaggat 


60 


ttgactgccg 


gcgaaatgc t 


qtttCQCQCC 


ttgegcaage 


acagccatct 


gccacaagct 


120 


ttacrtcraaccr 

u uyy i,yy ai*y 


taatcaataa 


tgaatctctg 


agctacaaag 


agttctttga 


ggcaaccgtg 


180 


ttcrctcractc 

^ ^ 13 ^ » *-* 


agagc ttgea 


caactgtggc 


tataagatga atgacgtcgt 


gtctatctgc 


240 


gccgaaaaca 


atactcgttt 


ctttattcct 


gtcatcgctg 


cctggtatat 


tggtatgatc 


300 


i Ocrt~ narf ccac 

-1. v/y I— *H ^— L* V*- ^ C*^-J 


tcaacgagag 


ctacattcct 


gatgaactgt 


gtaaagtgat 


gggcatctct 


360 




ttahcttcac 


cactaaaaat 


atcttgaaca 


aagtgctgga ggtccaaagc 


420 


r'crr'fl rra a t~ t~ 


ttat taaacg 


tat cat tat c 


ttggacactg 


tggaaaacat 


tcatggttgc 


480 




ctaatttcah 


caoccocfcac 


tctgatggca 


acattgecaa 


ttttaaacca 


540 


l« L* y I* Gl X* l> L> V»y 


accctctcaa 


acaggtggct 


gccatcctgt 


gtagctctgg 


tactactggc 


600 


J. j i_ L»y i_- i_. cici ciy y 


ci t - cr t~ r 1 3 hara 
y *-y *— **■* ^ **y 


aacccahcaa 


aacatttgeg 


tgegtctgat 


ccacgctctc 


660 


y CL L. l~ l« l« I* y 1* U- 


Ci ^> CL V>l»»Cl> 


actgattcct 


ggtgtcaccg 


tgttggtcta 


tetgectttt 


720 


U- L_ L> \«> Ci L> y l» L> Lr 


ttaacttcca 

l» l- y y ^ i> i> v.* i^o. 


catcactttg 


ggttacttta 


tggtgggcct 


gcgtgtcatt 


780 


d i_ y c tuoy 




crna aac t~ t~ t~ r* 

MM C*. V— • ^> L» l» * 


ttgaaagcta 


tccaagatta 


tgaagtgcgc 


840 




ci i~y cicty 


cotcatccta 


tttttgtcta 


agagccctct 


ggtggacaaa 


900 






L y a 3 u *-y *-y <- 


tgcggtgccg 


ctccactggc 


caaggaagtc 


960 


ci r* i~ cr a a a t* era 
y-uyayy uyy 


ccactaaaca 


cttgaacctg 


cctggcattc 


gttgtggttt 


cggcttgacc 


1020 


y ci a c-cll> l»gl 


crccicca ttat 


ccaatctctg 


cgcgacgaat 


ttaagagegg 


ttctttgggc 


1080 


cgtgtcaccc 


caccgatggc 


tgecaaaatt 


getgatcgeg 


aaactggtaa 


ggccttgggc 


1140 


cctaaccagg 


tgggtgagct 


gtgcatcaaa 


ggcccaatgg 


tcagcaaggg 


ttatgtgaat 


1200 


25aacgtcgaag 


ctaccaaaga 


ggccatcgac 


gatgaegget 


ggttgcattc 


tggtgatttc 


1260 


ggctactatg 


acgaagatga 


gcacttttac 


gtggtggacc 


gttataagga 


actgatcaaa 


1320 


tacaagggta 


gccaagtggc 


tcctgccgaa 


ttggaggaga 


ttctgttgaa 


aaatccatgt 


1380 


atecgegatg 


tcgctgtggt 


cggcattcct 


gacctggagg 


ccggtgaatt 


gccatctgct 


1440 


ttcgtggtca 


ageagectgg 


taaagagatc 


actgecaagg 


aagtgtatga 


ttacctggct 


1500 


3 0gaacgtgtca 


gccataccaa 


atatttgege 


ggtggcgtgc 


gttttgtgga 


ctctattcca 


1560 


cgtaacgtga 


ctggtaagat 


cacccgcaaa gaactgttga 


agcaactgtt 


ggagaaagee 


1620 


ggcggt 












1626 



<210> 12 
35<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

40<223> Sequence of a synthetic luciferase 
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<400> 12 



atgatgaagc 


gtgagaaaaa 


tgtcatctat ggccctgagc 


ctttgcaccc 


tttggaggat 


60 


ttgactgccg 


gcgaaatgct 


gtttcgcgct ttgcgtaagc actctcattt gcctcaagcc 


120 


ttggtcgatg 


tggtcggcga 


tgaatctttg agctataagg agttttttga ggcaaccgtc 


180 


Sttgctggctc 


agtctttgca 


taattgcggc 


tacaagatga 


acgacgtcgt 


ctctatttgt 


240 


gccgaaaaca 


atacccgttt 


cttcattcca gtcatcgccg 


cctggtatat 


cggtatgatc 


300 


gtggctccag 


tcaacgagag 


ctacattcct 


gacgaactgt 


gtaaagtcat 


gggtatctct 


360 


aagccacaga 


ttgtgttcac 


cactaagaat 


attttgaaca aagtgctgga 


agtccaaagc 


420 


cgcaccaact 


ttattaagcg 


tatcatcatc 


ttggacactg 


tggagaatat 


tcatggttgc 


480 


lOgaatctctgc 


ctaatttcat- 


tagccgctat 


tctgacggca 


acatcgccaa 


ctttaaacct 


540 


ttgcatttcg 


accctgtgga 


acaagtggct 


gctatcctgt 


gtagcagcgg 


tactactggc 


600 


ctcccaaagg 


gcgtcatgca 


gacccatcaa 


aacatttgcg tgcgtctgat 


ccatgctctc 


660 


gatccacgct 


acggcactca 


gctgattcct 


ggtgtcaccg 


tcttggtcta 


cctgcctttc 


720 


ttccatgctt 


tcggcttcca 


cattactttg ggttacttta 


tggtcggtct 


gcgtgtcatt 


780 


ISatgttccgcc 


gttttgatca 


ggaggctttt 


ttgaaagcca 


tccaagatta 


tgaagtccgc 


840 


agcgtcatta 


acgtgcctag 


cgtgatcctg 


tttttgtcta 


agagcccact 


cgtggacaag 


900 


tacgacttgt 


cttccctgcg 


tgagttgtgt 


tgcggtgccg 


ccccactggc 


taaggaggtc 


960 


gctgaagtgg 


ccgccaaacg 


cttgaatctg 


ccaggcattc 


gttgtggctt 


cggcctcacc 


1020 


gaatctacca 


gcgctattat 


tcaatctctc 


cgcgatgagt 


ttaagagcgg 


ctctttgggc 


1080 


2 0cgtgtcactc 


cactcatggc 


tgctaaaatc 


gctgatcgcg 


aaactggtaa ggctttgggc 


1140 


cctaaccaag 


tgggcgagct 


gtgtatcaaa 


ggccctatgg 


tgagcaaggg 


ttatgtcaat 


1200 


aacgtcgaag 


ctaccaagga 


ggccatcgac 


gacgacggct 


ggctgcattc 


tggtgatttt 


1260 


ggctactacg 


acgaagatga 


gcatttttac 


gtcgtggatc 


gttacaagga 


gctgatcaaa 


1320 


tacaagggta 


gccaggtggc 


tccagccgag 


ttggaggaga 


ttctgttgaa 


aaatccatgc 


1380 


25atccgtgatg 


tcgctgtggt 


cggcattcct gatctggagg ccggtgaact 


gccttctgct 


1440 


ttcgtcgtca 


agcagcctgg 


taaagaaatc 


accgccaaag 


aagtgtatga 


ttacctggct 


1500 


gaacgtgtga 


gccataccaa 


gtacttgcgt 


ggcggcgtgc gttttgtgga 


cagcattcca 


1560 


cgtaatgtga 


ctggtaaaat 


tacccgcaag gaactgttga 


agcaattgtt 


ggagaaggcc 


1620 


ggcggt 












1626 



30 



<210> 13 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 

35 

<220> 

<223> Sequence of a synthetic luciferase 
<400> 13 

4 0atgatgaagc gtgagaaaaa tgtcatctat ggccctgagc ctttgcatcc tttggaggat 60 
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ttgactgccg gcgaaatgct gtttcgtgct ttgcgtaaac actctcattt gcctcaagcc 120 

ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 180 

ttgctggctc agtccttgca taattgtggc tacaagatga acgacgtcgt ctccatttgt 240 

gcagaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 300 

Sgtggctccag tcaacgagag ctacattccc gacgaactgt gtaaagtcat gggtatctct 360 

aagccacaga ttgtcttcac cactaagaat attctgaaca aagtcctgga agtccaaagc 420 

cgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcacggttgc 480 

gaatctttgc ctaattttat tagccgctat tcagacggaa acatcgccaa ctttaagcct 540 

ctccatttcg accctgtgga acaagttgct gcaatcctgt gtagcagcgg tactactgga 600 

lOctcccaaagg gagtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 660 

gatccacgct acggcactca gctgattcct ggtgtcaccg tcttggtcta cttgcctttc 720 

ttccatgctt tcggcttcca tattactttg ggttacttta tggtcggtct gcgtgtgatt 780 
atgttccgcc gttttgatca ggaggctttc ttgaaagcca tccaagatta tgaagtccgc 
agtgtcatca acgtgcctag cgtgatcctg tttttgtcta agagcccact cgtggacaag 

IStacgacttgt cttcactgcg tgaattgtgt tgcggtgccg ctccactggc taaggaggtc 960 

gctgaagtgg ccgccaaacg cttgaatctg cccggcattc gttgtggctt cggcctcacc 1020 

gaatctacca gcgctattat tcagtctctc cgcgatgagt ttaagagcgg ctctttgggc 1080 

cgtgtcactc cactcatggc tgctaagatc gctgatcgcg aaactggtaa ggctttgggc 1140 

cctaaccaag tgggcgagct gtgtatcaaa ggccctatgg tgagcaaggg ttatgtcaat 1200 

2 0aacgtcgaag ctaccaagga ggctatcgac gacgacggct ggttgcattc tggtgatttt 1260 

ggatattacg acgaagatga gcatttttac gtcgtggatc gttacaagga gctgatcaaa 1320 

tacaagggta gccaggttgc tccagctgag ttggaggaga ttctgttgaa aaatccatgc 1380 

attcgcgatg tcgctgtggt cggcattcct gatctggagg ccggcgaact gccttctgct 1440 

ttcgttgtca agcagcctgg taaagaaatt accgccaaag aagtgtatga ttacctggct 1500 

25gaacgtgtga gccatactaa gtacttgcgt ggcggcgtgc gttttgtgga tagcattcct 1560 

cgcaatgtga ctggcaaaat tacccgcaag gagctgttga aacaattgtt ggagaaggcc 1620 

ggcggt 1626 



840 
900 



<210> 14 
30<211> 1626 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> Sequence of a synthetic luciferase 



<400> 14 

atgatgaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 60 

ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actctcattt gcctcaagcc 120 

40ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 180 
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ttgctggctc 


agtccctcca 


caattgtggc 


tacaagatga 


acgacgtcgt 


tagtatctgt 


240 


gctgaaaaca 


atacccgttt 


cttcattcca gtcatcgccg 


catggtatat 


cggtatgatc 


300 


gtggctccag 


tcaacgagag 


ctacattccc 


gacgaactgt 


gtaaagtcat 


gggtatctct 


360 


aagccacaga 


ttgtcttcac 


cactaagaat 


attctgaaca 


aagtcctgga 


agtccaaagc 


420 


Scgcaccaact 


ttattaagcg 


tatcatcatc 


ttggacactg 


tggagaatat 


tcacggttgc 


480 


gaatctttgc 


ctaatttcat 


ctctcgctat 


tcagacggca 


acatcgcaaa 


ctttaaacca 


540 


ctccacttcg 


accctgtgga 


acaagttgca 


gccattctgt 


gtagcagcgg 


tactactgga 


600 


ctcccaaagg 


gagtcatgca 


gacccatcaa 


aacatttgcg 


tgcgtctgat 


ccatgctctc 


660 


gatccacgct 


acggcactca 


gctgattcct 


ggtgtcaccg 


tcttggtcta 


cttgcctttc 


720 


lOttccatgctt 


tcggctttca 


tattactttg ggttacttta 


tggtcggtct 


ccgcgtgatt 


780 


atgttccgcc 


gttttgatca 


ggaggctttc 


ttgaaagcca 


tccaagatta 


tgaagtccgc 


840 


agtgtcatca 


acgtgcctag 


cgtgatcctg 


tttttgtcta 


agagcccact 


cgtggacaag 


900 


tacgacttgt 


cttcactgcg 


tgaattgtgt 


tgcggtgccg 


ctccactggc 


taaggaggtc 


960 


gctgaagtgg 


ccgccaaacg 


cttgaatctt 


ccagggattc 


gttgtggctt 


cggcctcacc 


1020 


15gaatctacca 


gcgctattat 


tcagtctctc 


cgcgatgagt 


ttaagagcgg 


ctctttgggc 


1080 


cgtgtcactc 


cactcatggc 


tgctaagatc 


gctgatcgcg 


aaactggtaa ggctttgggc 


1140 


cctaaccaag 


tgggcgagct 


gtgtatcaaa ggccctatgg 


tgagcaaggg 


ttatgtcaat 


1200 


aacgtcgaag 


ctaccaagga 


ggccatcgac 


gacgacggct 


ggttgcattc 


tggtgatttt 


1260 


ggatattacg 


acgaagatga 


gcatttttac gtcgtggatc 


gttacaagga 


gctgatcaaa 


1320 


2 0tacaagggta 


gccaggttgc 


tccagctgag 


ttggaggaga 


ttctgttgaa 


aaatccatgc 


1380 


attcgcgatg 


tcgctgtggt 


cggcattcct 


gatctggagg 


ccggcgaact 


gccttctgct 


1440 


ttcgttgtca 


agcagcctgg 


taaagaaatt 


accgccaaag 


aagtgtatga 


ttacctggct 


1500 


gaacgtgtga 


gccatactaa 


gtacttgcgt 


ggcggcgtgc 


gttttgttga 


ctccatccct 


1560 


cgtaacgtaa 


caggcaaaat 


tacccgcaag gagctgttga 


aacaattgtt 


ggagaaggcc 


1620 


25ggcggt 












1626 



<210> 15 

<211> 1626 

<212> DNA 

30<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic luciferase 



35<400> 15 

atgatgaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 60 

ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actcttattt gcctcaagcc 12 0 

ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 180 

ttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 240 

40gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 300 
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gtggctccag tcaacgagag ctacattccc gacgaactgt gtaaagtcat gggtatctct 360 

aagccacaga ttgtcttcac cactaagaat attctgaaca aagtcctgga agtccaaagc 420 

cgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcacggttgc 4 80 

gaatctttgc ctaatttcat ctctcgctat tcagacggca acatcgcaaa ctttaaacca 540 

Sctccacttcg accctgtgga acaagttgca gccattctgt gtagcagcgg tactactgga 600 

ctcccaaagg gagtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 660 

gatccacgct acggcactca gctgattcct ggtgtcaccg tcttggtcta cttgcctttc 720 

ttccatgctt tcggctttca tattactttg ggttacttta tggtcggtct ccgcgtgatt 7 80 

atgttccgcc gttttgatca ggaggctttc ttgaaagcca tccaagatta tgaagtccgc 84 0 

lOagtgtcatca acgtgcctag cgtgatcctg tttttgtcta agagcccact cgtggacaag 900 

tacgacttgt cttcactgcg tgaattgtgt tgcggtgccg ctccactggc taaggaggtc 960 

gctgaagtgg ccgccaaacg cttgaatctt ccagggattc gttgtggctt cggcctcacc 1020 

gaatctacca gcgctattat tcagtctctc cgcgatgagt ttaagagcgg ctctttgggc 1080 

cgtgtcactc cactcatggc tgctaagatc gctgatcgcg aaactggtaa ggctttgggc 1140 

ISccgaaccaag tgggcgagct gtgtatcaaa ggccctatgg tgagcaaggg ttatgtcaat 1200 

aacgttgaag ctaccaagga ggccatcgac gacgacggct ggttgcattc tggtgatttt 1260 

ggatattacg acgaagatga gcatttttac gtcgtggatc gttacaagga gctgatcaaa 1320 

tacaagggta gccaggttgc tccagctgag ttggaggaga ttctgttgaa aaatccatgc 13 80 

attcgcgatg tcgctgtggt cggcattcct gatctggagg ccggcgaact gccttctgct 1440 

20ttcgttgtca agcagcctgg taaagaaatt accgccaaag aagtgtatga ttacctggct 1500 

gaacgtgtga gccatactaa gtacttgcgt ggcggcgtgc gttttgttga ctccatccct 1560 

cgtaacgtaa caggcaaaat tacccgcaag gagctgttga aacaattgtt ggagaaggcc 1620 

ggcggt 1626 

25<210> 16 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 
30<220> 

<223> Sequence of a synthetic luciferase 
<400> 16 

atgatgaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 60 

35ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actctcattt gcctcaagcc 120 

ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 180 

ttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 24 0 

gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 300 

gtggctccag tcaacgagag ctacattccc gacgaactgt gtaaagtcat gggtatctct 360 

4 0aagccacaga ttgtcttcac cactaagaat attctgaaca aagtcctgga agtccaaagc 420 
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cgcaccaact 


ttattaagcg 


tatcatcatc 


ttggacactg 


tggagaatat 


tcacggttgc 


480 


gaatctttgc 


ctaatttcat 


ctctcgctat 


tcagacggca 


acatcgcaaa 


ctttaaacca 


540 


ctccacttcg 


accctgtgga 


acaagttgca gccattctgt 


gtagcagcgg 


tactactgga 


600 


ctcccaaagg 


gagtcatgca 


gacccatcaa 


aacatttgcg 


tgcgtctgat 


ccatgctctc 


660 


Sgatccacgct 


acggcactca 


gctgattcct 


ggtgtcaccg 


tcttggtcta 


cttgcctttc 


720 


ttccatgctt 


tcggctttca 


tattactttg ggttacttta 


tggtcggtct 


ccgcgtgatt 


780 


atgttccgcc 


gttttgatca 


ggaggctttc 


ttgaaagcca 


tccaagatta 


tgaagtccgc 


840 


agtgtcatca 


acgtgcctag 


cgtgatcctg 


tttttgtcta agagcccact 


cgtggacaag 


900 


tacgacttgt 


cttcactgcg 


tgaattgtgt 


tgcggtgccg 


ctccactggc 


taaggaggtc 


960 


lOgctgaagtgg 


ccgccaaacg 


cttgaatctt 


ccagggattc 


gttgtggctt 


cggcctcacc 


1020 


gaatctacca 


gcgctattat 


tcagtctctc 


cgcgatgagt 


ttaagagcgg 


ctctttgggc 


1080 


cgtgtcactc 


cactcatggc 


tgctaagatc 


gctgatcgcg 


aaactggtaa 


ggctttgggc 


1140 


ccgaaccaag 


tgggcgagct 


gtgtatcaaa 


ggccctatgg 


tgagcaaggg 


ttatgtcaat 


1200 


aacgttgaag 


ctaccaagga 


ggccatcgac 


gacgacggct 


ggttgcattc 


tggtgatttt 


1260 


lSggatattacg 


acgaagatga 


gcatttttac 


gtcgtggatc 


gttacaagga 


gctgatcaaa 


1320 


tacaagggta 


gccaggttgc 


tccagctgag 


ttggaggaga 


ttctgttgaa 


aaatccatgc 


1380 


attcgcgatg 


tcgctgtggt 


cggcattcct 


gatctggagg 


ccggcgaact 


gccttctgct 


1440 


ttcgttgtca 


agcagcctgg 


taaagaaatt 


accgccaaag 


aagtgtatga 


ttacctggct 


1500 


gaacgtgtga 


gccatactaa 


gtacttgcgt 


ggcggcgtgc 


gttttgttga 


ctccatccct 


1560 


2 0cgtaacgtaa 


caggcaaaat 


tacccgcaag gagctgttga aacaattgtt ggagaaggcc 


1620 


ggcggt 












1626 



<210> 17 
<211> 1626 
25<212> DNA 

<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic luciferase 

30 

<400> 17 

atgatgaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 60 

ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actctcattt gcctcaagcc 12 0 

ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 180 

35ttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 240 

gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 300 

gtggctccag tcaacgagag ctacattccc gacgaactgt gtaaagtcat gggtatctct 360 

aagccacaga ttgtcttcac cactaagaat attctgaaca aagtcctgga agtccaaagc 420 

cgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcacggttgc 4 80 

40gaatctttgc ctaatttcat ctctcgctat tcagacggca acatcgcaaa ctttaaacca 540 
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ctccacttcg 


accctgtgga 


acaagttgca gccattctgt gtagcagcgg tactactgga 


600 


ctcccaaagg 


gagtcatgca 


gacccatcaa 


aacatttgcg 


tgcgtctgat 


ccatgctctc 


660 


gatccacgct 


acggcactca 


gctgattcct ggtgtcaccg tcttggtcta cttgcctttc 


720 


ttccatgctt 


tcggctttca 


tattactttg ggttacttta 


tggtcggtct 


ccgcgtgatt 


780 


Satgttccgcc 


gttttgatca 


ggaggctttc 


ttgaaagcca 


tccaagatta 


tgaagtccgc 


840 


agtgtcatca 


acgtgcctag 


cgtgatcctg 


tttttgtcta 


agagcccact 


cgtggacaag 


900 


tacgacttgt 


cttcactgcg 


tgaattgtgt 


tgcggtgccg 


ctccactggc 


taaggaggtc 


960 


gctgaagtgg 


ccgccaaacg 


cttgaatctt 


ccagggattc 


gttgtggctt 


cggcctcacc 


1020 


gaatctacca 


gcgctattat 


tcagtctctc 


ggggatgagt 


ttaagagcgg 


ctctttgggc 


1080 


lOcgtgtcactc 


cactcatggc 


tgctaagatc 


gctgatcgcg 


aaactggtaa 


ggctttgggc 


1140 


ccgaaccaag 


tgggcgagct 


gtgtatcaaa 


ggccctatgg 


tgagcaaggg 


ttatgtcaat 


1200 


aacgttgaag 


ctaccaagga 


ggccatcgac 


gacgacggct 


ggttgcattc 


tggtgatttt 


1260 


ggatattacg 


acgaagatga 


gcatttttac 


gtcgtggatc 


gttacaagga 


gctgatcaaa 


1320 


tacaagggta 


gccaggttgc 


tccagctgag 


ttggaggaga 


ttctgttgaa 


aaatccatgc 


1380 


ISattcgcgatg 


tcgctgtggt 


cggcattcct 


gatctggagg 


ccggcgaact 


gccttctgct 


1440 


ttcgttgtca 


agcagcctgg 


taaagaaatt 


accgccaaag aagtgtatga 


ttacctggct 


1500 


gaacgtgtga 


gccatactaa 


gtacttgcgt 


ggcggcgtgc 


gttttgttga 


ctccatccct 


1560 


cgtaacgtaa 


caggcaaaat 


tacccgcaag gagctgttga 


aacaattgtt 


ggagaaggcc 


1620 


ggcggt 












1626 



20 



<210> 18 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 

25 

<220> 

<223> Sequence of a synthetic luciferase 



<400> 18 



30atgataaagc 


gtgagaaaaa 


tgtcatctat ggccctgagc ctctccatcc 


tttggaggat 


60 


ttgactgccg 


gcgaaatgct 


gtttcgtgct ctccgcaagc actctcattt 


gcctcaagcc 


120 


ttggtcgatg 


tggtcggcga 


tgaatctttg agctacaagg agttttttga 


ggcaaccgtc 


180 


ttgctggctc 


agtccctcca 


caattgtggc tacaagatga acgacgtcgt 


tagtatctgt 


240 


gctgaaaaca 


atacccgttt 


cttcattcca gtcatcgccg catggtatat 


cggtatgatc 


300 


35gtggctccag 


tcaacgagag 


ctacattccc gacgaactgt gtaaagtcat 


gggtatctct 


360 


aagccacaga 


ttgtcttcac 


cactaagaat attctgaaca aagtcctgga 


agtccaaagc 


420 


cgcaccaact 


ttattaagcg 


tatcatcatc ttggacactg tggagaatat 


tcacggttgc 


480 


gaatctttgc 


ctaatttcat 


ctctcgctat tcagacggca acatcgcaaa 


ctttaaacca 


540 


ctccacttcg 


accctgtgga 


acaagttgca gccattctgt gtagcagcgg 


tactactgga 


600 


40ctcccaaagg 


gagtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 


660 
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gatccacgct 


acggcactca 


gctgattcct 


ggtgtcaccg tcttggtcta 


cttgcctttc 


720 


ttccatgctt 


tcggctttca 


tattactttg ggttacttta tggtcggtct 


ccgcgtgatt 


780 


atgttccgcc 


gttttgatca 


ggaggctttc 


ttgaaagcca tccaagatta 


tgaagtccgc 


840 


agtgtcatca 


acgtgcctag 


cgtgatcctg 


tttttgtcta agagcccact 


cgtggacaag 


900 


Stacgacttgt 


cttcactgcg 


tgaattgtgt 


tgcggtgccg ctccactggc 


taaggaggtc 


960 


gctgaagtgg 


ccgccaaacg 


cttgaatctt 


ccagggattc gttgtggctt 


cggcctcacc 


1020 


gaatctacca 


gtgcgattat 


ccagactctc 


ggggatgagt ttaagagcgg 


ctctttgggc 


1080 


cgtgtcactc 


cactcatggc 


tgctaagatc 


gctgatcgcg aaactggtaa 


ggctttgggc 


1140 


ccgaaccaag 


tgggcgagct 


gtgtatcaaa ggccctatgg tgagcaaggg 


ttatgtcaat 


1200 


lOaacgttgaag 


ctaccaagga 


ggccatcgac 


gacgacggct ggttgcattc 


tggtgatttt 


1260 


ggatattacg 


acgaagatga 


gcatttttac 


gtcgtggatc gttacaagga 


gctgatcaaa 


1320 


tacaagggta 


gccaggttgc 


tccagctgag 


ttggaggaga ttctgttgaa 


aaatccatgc 


1380 


attcgcgatg 


tcgctgtggt 


cggcattcct 


gatctggagg ccggcgaact 


gccttctgct 


1440 


ttcgttgtca 


agcagcctgg 


tacagaaatt 


accgccaaag aagtgtatga 


ttacctggct 


1500 


ISgaacgtgtga 


gccatactaa 


gtacttgcgt 


ggcggcgtgc gttttgttga 


ctccatccct 


1560 


cgtaacgtaa 


caggcaaaat 


tacccgcaag gagctgttga aacaattgtt 


ggtgaaggcc 


1620 


ggcggt 










1626 



<210> 19 
20<211> 933 
<212> DNA 

<213> Renilla reniformis 
<400> 19 

25atgacttcga aagtttatga tccagaacaa aggaaacgga tgataactgg tccgcagtgg 60 

tgggccagat gtaaacaaat gaatgttctt gattcattta ttaattatta tgattcagaa 120 

aaacatgcag aaaatgctgt tattttttta catggtaacg cggcctcttc ttatttatgg 180 

cgacatgttg tgccacatat tgagccagta gcgcggtgta ttataccaga tcttattggt 240 

atgggcaaat caggcaaatc tggtaatggt tcttataggt tacttgatca ttacaaatat 300 

30cttactgcat ggtttgaact tcttaattta ccaaagaaga tcatttttgt cggccatgat 360 

tggggtgctt gtttggcatt tcattatagc tatgagcatc aagataagat caaagcaata 420 

gttcacgctg aaagtgtagt agatgtgatt gaatcatggg atgaatggcc tgatattgaa 4 80 

gaagatattg cgttgatcaa atctgaagaa ggagaaaaaa tggttttgga gaataacttc 540 

ttcgtggaaa ccatgttgcc atcaaaaatc atgagaaagt tagaaccaga agaatttgca 600 

35gcatatcttg aaccattcaa agagaaaggt gaagttcgtc gtccaacatt atcatggcct 660 

cgtgaaatcc cgttagtaaa aggtggtaaa cctgacgttg tacaaattgt taggaattat 720 

aatgcttatc tacgtgcaag tgatgattta ccaaaaatgt ttattgaatc ggatccagga 780 

ttcttttcca atgctattgt tgaaggcgcc aagaagtttc ctaatactga atttgtcaaa 840 

gtaaaaggtc ttcatttttc gcaagaagat gcacctgatg aaatgggaaa atatatcaaa 900 

40tcgttcgttg agcgagttct caaaaatgaa caa 933 
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<210> 20 
<211> 933 
<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> Sequence of a synthetic lucif erase 



<400> 20 



1 Oa^cicrr't" trra 




cccc aacicacr 


cgcaagcgca 


taatcaccQcr 


ccctcagtgg 


60 


tgggcccgct 


gcaagcagat 


gaacgtgctg 


gactccttca 


tcaactacta 


cgacagcgag 


120 


aagcacgccg 


agaacgccgt 


gatcttcctg 


cacggcaacg 


ccgcctccag 


ctacctgtgg 


180 


aggcacgtgg 


tgcctcacat 


cgagcccgtg 


gcccgctgca 


tcatccctga 


cctgatcggc 


240 


atgggcaagt 


ccggcaagag 


cggcaacggc 


tcctaccgcc 


tgctggacca 


ctacaagtac 


300 


ISctgaccgcct 


ggttcgagct 


gctgaacctg 


cccaagaaga 


tcatcttcgt 


gggccacgac 


360 


tggggagcct 


gcctggcctt 


ccactactcc 


tacgagcacc 


aggacaagat 


caaggccatc 


420 


gtgcacgccg 


agagcgtggt 


ggacgtgatc 


gagtcctggg 


acgagtggcc 


tgacatcgag 


480 


gaggacatcg 


ccctgatcaa 


gagcgaggag 


ggcgagaaga 


tggtgctgga 


gaacaacttc 


540 


ttcgtggaga 


ccatgctgcc 


cagcaagatc 


atgcgcaagc 


tggagcctga 


ggagttcgcc 


600 


2 Ogcctacctgg 


agcccttcaa 


. ggagaagggc 


gaggtgcgcc 


gccctaccct 


gtcctggccc 


660 


cgcgagatcc 


ctctggtgaa 


gggcggcaag 


cccgacgtgg 


tgcagatcgt 


gcgcaactac 


720 


aacgcctacc 


tgcgcgccag 


cgacgacctg 


cctaagatgt 


tcatcgagtc 


cgaccctggc 


780 


ttcttctcca 


acgccatcgt 


cgagggagcc 


aagaagttcc 


ccaacaccga 


gttcgtgaag 


840 


gtgaagggcc 


tgcacttctc 


ccaggaggac 


gcccctgacg 


agatgggcaa 


gtacatcaag 


900 


25agcttcgtgg 


agcgcgtgct 


gaagaacgag 


cag 






933 



<210> 21 
<211> 933 
<212> DNA 
30<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic luciferase 
35<400> 21 

atggcttcca aggtgtacga ccccgagcaa cgcaaacgca tgatcactgg gcctcagtgg 60 
tgggctcgct gcaagcaaat gaacgtgctg gactccttca tcaactacta tgattccgag 120 
aagcacgccg agaacgccgt gatttttctg catggtaacg ctgcctccag ctacctgtgg 180 
aggcacgtcg tgcctcacat cgagcccgtg gctcgctgca tcatccctga tctgatcgga 240 
40atgggtaagt ccggcaagag cgggaatggc tcatatcgcc tcctggatca ctacaagtac 300 
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ctcaccgctt 


ggttcgagct 


gctgaacctt 


ccaaagaaaa tcatctttgt gggccacgac 


360 


tggggggctt 


gtctggcctt 


tcactactcc 


tacgagcacc aagacaagat caaggccatc 


420 


gtccatgctg 


agagtgtcgt 


ggacgtgatc 


gagtcctggg acgagtggcc tgacatcgag 


480 


gaggatatcg 


ccctgatcaa 


gagcgaagag 


ggcgagaaaa tggtgcttga gaataacttc 


540 


Sttcgtcgaga 


ccatgctccc 


aagcaagatc 


atgcggaaac tggagcctga ggagttcgct 


600 


gcctacctgg 


agcccttcaa 


ggagaagggc 


gaggttagac ggcctaccct ctcctggcct 


660 


cgcgagatcc 


ctctcgttaa 


gggaggcaag 


cccgacgtcg tccagattgt ccgcaactac 


720 


aacgcctacc 


ttcgggccag 


cgacgatctg 


cctaagatgt tcatcgagtc cgaccctggg 


780 


ttcttttcca 


acgctattgt 


cgagggagct 


aagaagttcc ctaacaccga gttcgtgaag 


840 


lOgtgaagggcc 


tccacttcag 


ccaggaggac 


gctccagatg aaatgggtaa gtacatcaag 


900 


agcttcgtgg 


agcgcgtgct 


gaagaacgag 


cag 


933 



<210> 22 
<211> 933 
15<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Sequence of a synthetic lucif erase 

20 



<400> 22 



atggcttcca 


aggtgtacga 


ccccgagcaa 


cgcaaacgca 


tgatcactgg gcctcagtgg 


60 


tgggctcgct 


gcaagcaaat 


gaacgtgctg 


gactccttca 


tcaactacta 


tgattccgag 


120 


aagcacgccg 


agaacgccgt 


gatttttctg 


catggtaacg 


ctgcctccag 


ctacctgtgg 


180 


25aggcacgtcg 


tgcctcacat 


cgagcccgtg 


gctagatgca 


tcatccctga 


tctgatcgga 


240 


atgggtaagt 


ccggcaagag 


cgggaatggc 


tcatatcgcc 


tcctggatca 


ctacaagtac 


300 


ctcaccgctt 


ggttcgagct 


gctgaacctt 


ccaaagaaaa 


tcatctttgt 


gggccacgac 


360 


tggggggctt 


gtctggcctt 


tcactactcc 


tacgagcacc 


aagacaagat 


caaggccatc 


420 


gtccatgctg 


agagtgtcgt 


ggacgtgatc 


gagtcctggg acgagtggcc 


tgacatcgag 


480 


3 0gaggatatcg 


ccctgatcaa 


gagcgaagag 


ggcgagaaaa 


tggtgcttga 


gaataacttc 


540 


ttcgtcgaga 


ccatgctccc 


aagcaagatc 


atgcggaaac 


tggagcctga 


ggagttcgct 


600 


gcctacctgg 


agccattcaa 


ggagaagggc 


gaggttagac ggcctaccct 


ctcctggcct 


660 


cgcgagatcc 


ctctcgttaa 


gggaggcaag 


cccgacgtcg 


tccagattgt 


ccgcaactac 


720 


aacgcctacc 


ttcgggccag 


cgacgatctg 


cctaagatgt 


tcatcgagtc 


cgaccctggg 


780 


35ttcttttcca 


acgctattgt 


cgagggagct 


aagaagttcc 


ctaacaccga 


gttcgtgaag 


840 


gtgaagggcc 


tccacttcag 


ccaggaggac 


gctccagatg aaatgggtaa 


gtacatcaag 


900 


agcttcgtgg 


agcgcgtgct 


gaagaacgag 


cag 






933 



<210> 23 
40<211> 543 
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<212> PRT 

<213> Pyrophorus plagioph thalamus 
<400> 23 

5Met Met Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His 
15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Phe Gly Asp Glu 
10 35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Cys Leu Leu Ala Gin 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 
65 70 75 80 

15 Ala Glu Asn Asn Lys Arg Phe Phe lie . Pro lie He Ala Ala Trp Tyr 
85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Cys Thr 
20 115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 

130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

25Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 
165 170 175 

Asn Phe Lys Pro Leu His Tyr Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
30 195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Ala 

210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

35Phe His Ala Phe Gly Phe Ser He Asn Leu Gly Tyr Phe Met Val Gly 
245 250 255 

Leu Arg Val He Met Leu Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ala lie 
40 275 280 285 



WO 02/16944 



PCT/US01/26566 



22 

lie Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

SAla Glu Val Ala Val Lys Arg Leu Asn Leu Pro Gly lie Arg Cys Gly 
325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala Asn lie His Ser Leu Gly Asp 

340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
10 355 360 365 

Lys lie Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 

Gly Glu Leu Cys Val Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

15Asn Val Glu Ala Thr Lys Glu Ala lie Asp Asp Asp Gly Trp Leu His 
405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 

420 425 430 

Asp Arg Tyr Lys Glu Leu lie Lys Tyr Lys Gly Ser Gin Val Ala Pro 
20 435 440 445 

Ala Glu Leu Glu Glu lie Leu Leu Lys Asn Pro Cys lie Arg Asp Val 

450 455 460 

Ala Val Val Gly lie Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

25Phe Val Val Lys Gin Pro Gly Lys Glu lie Thr Ala Lys Glu Val Tyr 
485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser lie Pro Arg Asn Val Thr Gly Lys lie Thr 
30 515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ser Ser Lys Leu 
530 535 540 



<210> 24 
35<211> 542 
<212> PRT 

<213> Artificial Sequence 



<220> 

40<223> Sequence of clone YG#81-6G01 
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<400> 24 

Met Met Lys Arg Glu Lys Asia Val He Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
5 20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
50 55 60 

lOSer Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
15 100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
130 135 140 

20Ile Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 
25 180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Ala 
210 215 220 

30Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe Ser He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
35 260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 
290 295 300 

40Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
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305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly lie Arg Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala Asn lie His Ser Leu Arg Asp 
5 340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys lie Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
370 375 380 

lOGly Glu Leu Cys lie Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala lie Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
15 420 425 430 

Asp Arg Tyr Lys Glu Leu lie Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu lie Leu Leu Lys Asn Pro Cys lie Arg Asp Val 
450 455 460 

20Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 
25 500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

30 

<210> 25 
<211> 542 
<212> PRT 

<213> Artificial Sequence 

35 

<220> 

<223> Sequence of a synthetic luciferase 
<400>' 25 

40Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 
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1 



5 



10 



15 



Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

; 35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 



lOAla Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 
85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 
15 115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 

130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

20Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 
165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
25 195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Val 

210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

3 0Phe His Ala Phe Gly Phe Ser He Thr Leu Gly Tyr Phe Met Val Gly 
245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 
35 275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

40Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 



65 



70 



75 



80 
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Phe Gly Leu Thr 
340 

Glu Phe Lys Ser 
5 355 
Lys lie Ala Asp 
370 

Gly Glu Leu Cys 
385 

lOAsn Val Glu Ala 

Ser Gly Asp Phe 
420 

Asp Arg Tyr Lys 
15 435 
Ala Glu Leu Glu 
450 

Ala Val Val Gly 
465 

20 Phe Val Val Lys 

Asp Tyr Leu Ala 
500 

Val Arg Phe Val 
25 515 
Arg Lys Glu Leu 
530 



325 

Glu Ser Thr Ser 

Gly Ser Leu Gly 
360 

Arg Glu Thr Gly 
375 

lie Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu lie Lys 
440 

Glu lie Leu Leu 
455 

lie Pro Asp Leu 
470 

Gin Pro Gly Lys 
485 

Glu Arg Val Ser 

Asp Ser lie Pro 
520 

Leu Lys Gin Leu 
535 



26 
330 

Ala Asn lie His 
345 

Arg Val Thr Pro 

Lys Ala Leu Gly 
380 

Met Val Ser Lys 
3 95 

He Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu He Thr Ala 
490 

His Thr Lys Tyr 
505 

Arg Asn Val Thr 

Leu Glu Lys Ala 
540 



335 

Ser Leu Arg Asp 
350 

Leu Met Ala Ala 
365 

Pro Asn Gin Val 

Gly Tyr Val Asn * 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 
510 

Gly Lys He Thr 
525 

Gly Gly 



<210> 26 

30<211> 542 

<212> PRT 

<213> Artificial Sequence 



<220> 

35<223> Sequence of a synthetic luciferase 



<400> 26 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 
15 10 15 

40Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
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20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
5 50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 
85 90 95 

lOIle Gly Met He Va\ Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
100 105 HO 

Leu Cys Lys Val Met Gly He' Ser Lys Pro Gin He Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
15 130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe lie Ser Arg Tyr Ser Asp Gly Asn He Ala 
165 170 175 

20Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala lie 
180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn He Cys Val Arg Leu lie His Ala Leu Asp Pro Arg Val 
25 210 215 220 

Gly Thr Gin Leu lie Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe Ser He Thr Leu Gly Tyr Phe Met Val Gly 
245 250 255 

30Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
260 265 270 

Ala lie Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

lie Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 
35 290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly lie Arg Cys Gly 
325 330 335 

40Phe Gly Leu Thr Glu Ser Thr Ser Ala Asn He His Ser Leu Arg Asp 
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340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
5 370 375 380 

Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 
405 410 415 

lOSer Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys lie Arg Asp Val 
15 450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 
485 490 495 

20Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 
500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
25 530 535 540 

<210> 27 
<211> 542 
<212> PRT 
30<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic lucif erase 
35<400> 27 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
20 25 30 

40Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
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Ser Leu Ser Tyr Lys 
50 

Ser Leu His Asn Cys 
565 

Ala Glu Asn Asn Thr 
85 

He Gly Met He Val 
100 

lOLeu Cys Lys Val Met 
115 

Lys Asn He Leu Asn 
130 

He Lys Arg He He 
15145 

Glu Ser Leu Pro Asn 
165 

Asn Phe Lys Pro Leu 
180 

2 0Leu Cys Ser Ser Gly 

195 

His Gin Asn He Cys 
210 

Gly Thr Gin Leu He 
25225 

Phe His Ala Phe Gly 
245 

Leu Arg Val He Met 
260 

3 0Ala He Gin Asp Tyr 

275 

lie Leu Phe Leu Ser 
290 

Ser Leu Arg Glu Leu 
35305 

Ala Glu Val Ala Ala 
325 

Phe Gly Leu Thr Glu 
340 

40Glu Phe Lys Ser Gly 



29 

40 

Glu Phe Phe Glu Ala 
55 

Gly Tyr Lys Met Asn 
70 

Arg Phe Phe He Pro 
90 

Ala Pro Val Asn Glu 
105 

Gly lie Ser Lys Pro 
120 

Lys Val Leu Glu Val 
135 

He Leu Asp Thr Val 
150 

Phe lie Ser Arg Tyr 
170 

His Phe Asp Pro Val 
185 

Thr Thr Gly Leu Pro 
200 

Val Arg Leu He His 
215 

Pro Gly Val Thr Val 
230 

Phe Ser He Thr Leu 
250 

Phe Arg Arg Phe Asp 
265 

Glu Val Arg Ser Val 
280 

Lys Ser Pro Leu Val 
295 

Cys Cys Gly Ala Ala 
310 

Lys Arg Leu Asn Leu 
330 

Ser Thr Ser Ala Asn 
345 

Ser Leu Gly Arg Val 



45 

Thr Val Leu Leu Ala Gin 
60 

Asp Val Val Ser He Cys 
75 80 
Val He Ala Ala Trp Tyr 
95 

Ser Tyr lie Pro Asp Glu 
110 

Gin He Val Phe Thr Thr 
125 

Gin Ser Arg Thr Asn Phe 
140 

Glu Asn He His Gly Cys 
155 160 
Ser Asp Gly Asn He Ala 
175 

Glu Gin Val Ala Ala He 
190 

Lys Gly Val Met Gin Thr 
205 

Ala Leu Asp Pro Arg Val 
220 

Leu Val Tyr Leu Pro Phe 
235 240 
Gly Tyr Phe Met Val Gly 
255 

Gin Glu Ala Phe Leu Lys 
270 

lie Asn Val Pro Ser Val 
285 

Asp Lys Tyr Asp Leu Ser 
300 

Pro Leu Ala Lys Glu Val 
315 320 
Pro Gly He Arg Cys Gly 
335 

He His Ser Leu Arg Asp 
350 

Thr Pro Leu Met Ala Ala 
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355 

Lys lie Ala Asp 
370 

Gly Glu Leu Cys 
5385 
Asn Val Glu Ala 

Ser Gly Asp Phe 
420 

10 Asp Arg Tyr Lys 
435 

Ala Glu Leu Glu 
450 

Ala Val Val Gly 
15465 

Phe Val Val Lys 

Asp Tyr Leu Ala 
500 

2 0Val Arg Phe Val 
515 

Arg Lys Glu Leu 
530 



360 

Arg Glu Thr Gly 
375 

lie Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu lie Lys 
440 

Glu lie Leu Leu 
455 

lie Pro Asp Leu 
470 

Gin Pro Gly Lys 
485 

Glu Arg Val Ser 

Asp Ser lie Pro 
520 

Leu Lys Gin Leu 
535 



30 

Lys Ala Leu Gly 
380 

Met Val Ser Lys 
395 

lie Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu He Thr Ala 
490 

His Thr Lys Tyr 
505 

Arg Asn Val Thr 

Leu Glu Lys Ala 
540 



365 

Pro Asn Gin Val 

Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 
510 

Gly Lys He Thr 
525 

Gly Gly 



25<210> 28 
<211> 542 
<212> PRT 

<213> Artificial Sequence 



30<220> 

<223> Sequence of a synthetic lucif erase 
<400> 28 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 
35 1 5 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
35 40 45 

4 0Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 



WO 02/16944 



PCT/US01/26566 



31 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 
5 85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 

100 105 HO 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 
115 120 125 

lOLys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
130 135 140 

He Lys Arg lie lie lie Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 
15 165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
195 200 205 

2 0His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Val 

210 215 220 

Gly Thr Gin Leu lie Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe Ser He Thr Leu Gly Tyr Phe Met Val Gly 
25 245 250 255 

Leu Arg Val lie Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala lie Gin Asp Tyr Glu Val Arg Ser Val lie Asn Val Pro Ser Val 
275 280 285 

3 0Ile Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 
35 325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala Asn lie His Ser Leu Arg Asp 

340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
355 360 365 

4 0Lys lie Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
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370 

Gly Glu Leu Cys 
385 

Asn Val Glu Ala 

5 

Ser Gly Asp Phe 
420 

Asp Arg Tyr Lys 
435 

lOAla Glu Leu Glu 
450 

Ala Val Val Gly 
465 

Phe Val Val Lys 

15 

Asp Tyr Leu Ala 
500 

Val Arg Phe Val 
515 

2 0Arg Lys Glu Leu 
530 



375 

lie Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu lie Lys 
440 

Glu lie Leu Leu 
455 

lie Pro Asp Leu 
470 

Gin Pro Gly Lys 
485 

Glu Arg Val Ser 

Asp Ser lie Pro 
520 

Leu Lys Gin Leu 
535 



32 

380 

Met Val Ser Lys 
395 

lie Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu He Thr Ala 
490 

His Thr Lys Tyr 
505 

Arg Asn Val Thr 

Leu Glu Lys Ala 
540 



Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 
510 

Gly Lys He Thr 
525 

Gly Gly 



<210> 29 
<211> 542 
25<212> PRT 

<213> Artificial Sequence 



<220> 

<223> Sequence of a synthetic luciferase 

30 

<400> 29 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

1 5 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
35 20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
50 55 60 

40Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
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33 

65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe lie Pro Val He Ala Ala Trp Tyr 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
5 100 105 HO 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
130 135 140 

lOIle Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 
15 180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn lie Cys Val Arg Leu He His Ala Leu Asp Pro Arg Val 
210 215 220 

2 0Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe Ser He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
25 260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 
290 295 300 

30Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala Asn He His Ser Leu Arg Asp 
35 340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
370 375 380 

40Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
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385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala lie Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
5 420 425 430 

Asp Arg Tyr Lys Glu Leu lie Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu lie Leu Leu Lys Asn Pro Cys lie Arg Asp Val 
450 455 460 

lOAla Val Val Gly lie Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 . 480 

Phe Val Val Lys Gin Pro Gly Lys Glu lie Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 
15 500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

20 

<210> 30 
<211> 542 
<212> PRT 

<213> Artificial Sequence 

25 

<220> 

<223> Sequence of a synthetic luciferase 
<400> 30 

30Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 
15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
35 35 40 45 

Asn Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

40Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 
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35 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 
5 115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 

130 135 * 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

lOGlu Ser Leu Pro Asn Phe lie Ser Arg Tyr Ser Asp Gly Asn He Ala 
165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
15 195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Val 

210 215 220 

Gly Thr Gin Leu He Ser Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

20Phe His Ala Phe Gly Phe Ser He Thr Leu Gly Tyr Phe Met Val Gly 
245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 
25 275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

30Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 
325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala Asn He His Ser Leu Arg Asp 

340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
35 355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 

Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

40Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 
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36 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 

420 425 430 

Asp Arg Tyr Lys Glu Leu lie Lys Tyr Lys Gly Ser Gin Val Ala Pro 
5 435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 

450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

lOPhe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 
485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 
15 515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

<210> 31 
20<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

25<223> Sequence of a synthetic luciferase 
<400> 31 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 
15 10 15 

3 0 Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
35 50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 
85 90 95 

40Ile Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
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100 

Leu Cys Lys Val Met 
115 

Lys Asn lie Leu Asn 
5 130 
lie Lys Arg He He 
145 

Glu Ser Leu Pro Asn 
165 

10 Asn Phe Lys Pro Leu 
180 

Leu Cys Ser Ser Gly 
195 

His Gin Asn He Cys 
15 210 

Gly Thr Gin" Leu He 
225 

Phe His Ala Phe Gly 
245 

2 0Leu Arg Val He Met 
260 

Ala He Gin Asp Tyr 
275 

He Leu Phe Leu Ser 
25 290 

Ser Leu Arg Glu Leu 
305 

Ala Glu Val Ala Ala 
325 

30 Phe Gly Leu Thr Glu 
340 

Glu Phe Lys Ser Gly 
355 

Lys He Ala Asp Arg 
35 370 

Gly Glu Leu Cys He 
385 

Asn Val Glu Ala Thr 
405 

40Ser Gly Asp Phe Gly 



37 

105 

Gly He Ser Lys Pro 
120 

Lys Val Leu Glu Val 
135 

He Leu Asp Thr Val 
150 

Phe He Ser Arg Tyr 
170 

His Phe Asp Pro Val 
185 

Thr Thr Gly Leu Pro 
200 

Val Arg Leu He His 
215 

Pro Gly Val Thr Val 
230 

Phe Ser He Thr Leu 
250 

Phe Arg Arg Phe Asp 
265 

Glu Val Arg Ser Val 
280 

Lys Ser Pro Leu Val 
295 

Cys Cys Gly Ala Ala 
310 

Lys Arg Leu Asn Leu 
330 

Ser Thr Ser Ala Asn 
345 

Ser Leu Gly Arg Val 
360 

Glu Thr Gly Lys Ala 
375 

Lys Gly Pro Met Val 
390 

Lys Glu Ala He Asp 
410 

Tyr Tyr Asp Glu Asp 



110 

Gin He Val Phe Thr Thr 
125 

Gin Ser Arg Thr Asn Phe 
140 

Glu Asn He His Gly Cys 
155 160 
Ser Asp Gly Asn He Ala 
175 

Glu Gin Val Ala Ala He 
190 

Lys Gly Val Met Gin Thr 
2 05 

Ala Leu Asp Pro Arg Val 
220 

Leu Val Tyr Leu Pro Phe 
235 240 
Gly Tyr Phe Met Val Gly 
255 

Gin Glu Ala Phe Leu Lys 
270 

He Asn Val Pro Ser Val 
285 

Asp Lys Tyr Asp Leu Ser 
300 

Pro Leu Ala Lys Glu Val 
315 320 
Pro Gly He Arg Cys Gly 
335 

lie His Ser Leu Arg Asp 
350 

Thr Pro Leu Met Ala Ala 
365 

Leu Gly Pro Asn Gin Val 
380 

Ser Lys Gly Tyr Val Asn 
395 400 
Asp Asp Gly Trp Leu His 
415 

Glu His Phe Tyr Val Val 
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38 

420 425 430 

Asp Arg Tyr Lys Glu Leu lie Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu lie Leu Leu Lys Asn Pro Cys lie Arg Asp Val 
5 450 455 460 

Ala Val Val Gly lie Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu lie Thr Ala Lys Glu Val Tyr 
485 490 495 

10 Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 
500 505 510 

Val Arg Phe Val Asp Ser lie Pro Arg Asn Val Thr Gly Lys lie Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
15 530 535 540 

<210> 32 . 
<211> 542 
<212> PRT 
20<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic luciferase 
25<400> 32 

Met Met Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
20 25 30 

3 0Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
3565 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
100 105 110 

4 0Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 
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115 

Lys Asn lie Leu Asn 
130 

lie Lys Arg lie lie 
5145 

Glu Ser Leu Pro Asn 
165 

Asn Phe Lys Pro Leu 
180 

lOLeu Cys Ser Ser Gly 
195 

His Gin Asn lie Cys 
210 

Gly Thr Gin Leu lie 
15225 

Phe His Ala Phe Gly 
245 

Leu Arg Val lie Met 
260 

2 0Ala He Gin Asp Tyr 
275 

He Leu Phe Leu Ser 
290 

Ser Leu Arg Glu Leu 
25305 

Ala Glu Val Ala Ala 
325 

Phe Gly Leu Thr Glu 
340 

30Glu Phe Lys Ser Gly 
355 

Lys He Ala Asp Arg 
370 

Gly Glu Leu Cys He 
35385 

Asn Val Glu Ala Thr 
405 

Ser Gly Asp Phe Gly 
420 

40Asp Arg Tyr Lys Glu 



39 

120 

Lys Val Leu Glu Val 
135 

He Leu Asp Thr Val 
150 

Phe He Ser Arg Tyr 
170 

His Phe Asp Pro Val 
185 

Thr Thr Gly Leu Pro 
200 

Val Arg Leu He His 
215 

Pro Gly Val Thr Val 
230 

Phe His He Thr Leu 
250 

Phe Arg Arg Phe Asp 
265 

Glu Val Arg Ser Val 
280 

Lys Ser Pro Leu Val 
295 

Cys Cys Gly Ala Ala 
310 

Lys Arg Leu Asn Leu 
330 

Ser Thr Ser Ala He 
345 

Ser Leu Gly Arg Val 
360 

Glu Thr Gly Lys Ala 
375 

Lys Gly Pro Met Val 
390 

Lys Glu Ala He Asp 
410 

Tyr Tyr Asp Glu Asp 
425 

Leu He Lys Tyr Lys 



125 

Gin Ser Arg Thr Asn Phe 
140 

Glu Asn He His Gly Cys 
155 160 
Ser Asp Gly Asn He Ala 
175 

Glu Gin Val Ala Ala He 
190 

Lys Gly Val Met Gin Thr 
205 

Ala Leu Asp Pro Arg Tyr 
220 

Leu Val Tyr Leu Pro Phe 
235 240 
Gly Tyr Phe Met Val Gly 
255 

Gin Glu Ala Phe Leu Lys 
270 

He Asn Val Pro Ser Val 
285 

Asp Lys Tyr Asp Leu Ser 
300 

Pro Leu Ala Lys Glu Val 
315 320 
Pro Gly He Arg Cys Gly 
335 

He Gin Ser Leu Arg Asp 
350 

Thr Pro Leu Met Ala Ala 
365 

Leu Gly Pro Asn Gin Val 
380 

Ser Lys Gly Tyr Val Asn 
395 400 
Asp Asp Gly Trp Leu His 
415 

Glu His Phe Tyr Val Val 
430 

Gly Ser Gin Val Ala Pro 
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435 

Ala Glu Leu Glu 
450 

Ala Val Val Gly 
5465 
Phe Val Val Lys 

Asp Tyr Leu Ala 
500 

lOVal Arg Phe Val 
515 

Arg Lys Glu Leu 
530 



440 

Glu lie Leu Leu 
455 

lie Pro Asp Leu 
470 

Gin Pro Gly Lys 
485 

Glu Arg Val Ser 

Asp Ser lie Pro 
520 

Leu Lys Gin Leu 
535 



40 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu lie Thr Ala 
490 

His Thr Lys Tyr 
505 

Arg Asn Val Thr 

Leu Glu Lys Ala 
540 



445 

lie Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 
510 

Gly Lys lie Thr 
525 

Gly Gly 



15<210> 33 
<211> 542 
<212> PRT 

<213> Artificial Sequence 



20<220> 

<223> Sequence of a synthetic luciferase 



<400> 33 

Met Met Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His 
25 1 5 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
35 40 45 

30Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
50 .55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe lie Pro Val He Ala Ala Trp Tyr 
35 85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 
115 120 125 

40Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
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130 

lie Lys Arg lie 
145 

Glu Ser Leu Pro 

5 

Asn Phe Lys Pro 
180 

Leu Cys Ser Ser 
195 

lOHis Gin Asn lie 
210 

Gly Thr Gin Leu 
225 

Phe His Ala Phe 

15 

Leu Arg Val lie 
260 

Ala lie Gin Asp 
275 

2 0Ile Leu Phe Leu 
290 

Ser Leu Arg Glu 
305 

Ala Glu Val Ala 

25 

Phe Gly Leu Thr 
340 

Glu Phe Lys Ser 
355 

30Lys lie Ala Asp 
370 

Gly Glu Leu Cys 
385 

Asn Val Glu Ala 

35 

Ser Gly Asp Phe 
420 

Asp Arg Tyr Lys 
435 

40 Ala Glu Leu Glu 



135 

lie lie Leu Asp 
150 

Asn Phe lie Ser 
165 

Leu His Phe Asp 

Gly Thr Thr Gly 
200 

Cys Val Arg Leu 
215 

He Pro Gly Val 
230 

Gly Phe His He 
245 

Met Phe Arg Arg 

Tyr Glu Val Arg 
280 

Ser Lys Ser Pro 
295 

Leu Cys Cys Gly 
310 

Ala Lys Arg Leu 
325 

Glu Ser Thr Ser 

Gly Ser Leu Gly 
360 

Arg Glu Thr Gly 
375 

He Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu He Lys 
440 

Glu He Leu Leu 



41 

140 

Thr Val Glu Asn 
155 

Arg Tyr Ser Asp 
170 

Pro Val Glu Gin 
185 

Leu Pro Lys Gly 

He His Ala Leu 
220 

Thr Val Leu Val 
235 

Thr Leu Gly Tyr 
250 

Phe Asp Gin Glu 
265 

Ser Val He Asn 

Leu Val Asp Lys 
300 

Ala Ala Pro Leu 
315 

Asn Leu Pro Gly 
330 

•Ala lie He Gin 
345 

Arg Val Thr Pro 

Lys Ala Leu Gly 
380 

Met Val Ser Lys 
395 

He Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 
Lys Asn Pro Cys 



He His Gly Cys 
160 

Gly Asn He Ala 
175 

Val Ala Ala He 
190 

Val Met Gin Thr 
205 

Asp Pro Arg Tyr 

Tyr Leu Pro Phe 
240 

Phe Met Val Gly 
255 

Ala Phe Leu Lys 
270 

Val Pro Ser Val 
285 

Tyr Asp Leu Ser 

Ala Lys Glu Val 
320 

He Arg Cys Gly 
335 

Ser Leu Arg Asp 
350 

Leu Met Ala Ala 
365 

Pro Asn Gin Val 

Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 
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42 

450 455 460 

Ala Val Val Gly lie Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu lie Thr Ala Lys Glu Val Tyr 
5 485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser lie Pro Arg Asn Val Thr Gly Lys lie Thr 
515 520 525 

lOArg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

<210> 34 
<211> 542 
15<212> PRT 

<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic luciferase 

20 

<400> 34 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
25 20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
50 55 60 

30Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
35 100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
130 135 140 

40Ile Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
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43 



145 



150 



155 



160 



Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 ' 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 
i 180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 



lOGly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe His He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
15 260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 
290 295 300 

20Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala He He Gin Ser Leu Arg Asp 
25 340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
370 375 380 

30Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
35 420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 
450 455 460 

40Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 



210 



215 



220 
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465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 
5 500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 . 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

10 

<210> 35 
<211> 29 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 

i 

<400> 35 

2 0acgccagccc aagcttaggc ctgagtggc 29 

<210> 36 

<211> 44 

<212> DNA 

25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 36 

cttaattctc cccatccccc tgttgacaat taatcatcgg ctcg 44 

<210> 37 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 



WO 02/16944 



45 

<400> 37 

tataatgtga ggaattgcga gcggataaca atttcacaca 

<210> 38 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 38 

atgggatgtt acctagacca atatgaaata tttggtaaat 

15<210> 39 

<211> 40 

<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 39 

aaatgcttaa tgaatttcaa aaaaaaaaaa aaaggaattc 

25 

<210> 40 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 40 

35gatatcaagc ttatcgatac cgtcgacctc gaggattata 

<210> 41 
<211> 37 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 41 

Stagaaaaagg cctcggcggc cgctagttca gtcagtt 

<210> 42 
<211> 17 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

15<400> 42 

aactgactga actagcg 

<210> 43 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 43 

gccgccgagg cctttttcta tataatcctc gaggtcgacg 

<210> 44 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 44 

gtatcgataa gcttgatatc gaattccttt tttttttttt 
40<210> 45 



WO 02/16944 



47 

<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 45 

agcttgatat cgaattcctt tttttttttt tttgaaattc 

10 

<210> 46 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 46 

20ttgaaattca ttaagcattt atttaccaaa tatttcatat 

<210> 47 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 47 

tggtctaggt aacatcccat cactagcttt tttttctata 

<210> 48 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
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<400> 48 

tcgcaattcc tcacattata cgagccgatg attaattgtc 

<210> 49 
5<211> 53 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 49 

aacaggggga tggggagaat taaggccact caggcctaag cttgggctgg cgt 

15<210> 50 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 50 

ggaaacagga tcccatgatg aaacgcgaaa agaacgtgat 

25 

<210> 51 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 51 

35ctacggccca gaaccactgc atccactgga agacctcacc 

<210> 52 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 



<223> An oligonucleotide 



<400> 52 



Sgctggtgaga tgctcttccg agcactgcgt aaacatagtc 



40 



<210> 53 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 53 

acctccctca agcactcgtg gacgtcgtgg gagacgagag 40 
<210> 54 

<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 54 

cctctcctac aaagaatttt tcgaagctac tgtgctgttg 40 

<210> 55 

30<211> 40 

<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 



<400> 55 



gcccaaagcc tccataattg tgggtacaaa atgaacgatg 



40 



40<210> 56 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 56 

tggtgagcat ttgtgctgag aataacactc gcttctttat 40 

10 

<210> 57 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 57 

20tcctgtaatc gctgcttggt acatcggcat gattgtcgcc 40 

<210> 58 

<211> 40 

<212> DNA 

25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 58 

cctgtgaatg aatcttacat cccagatgag ctgtgtaagg 4 0 

<210> 59 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 
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<400> 59 

ttatgggtat tagcaaacct caaatcgtct ttactaccaa 40 

<210> 60 

5<211> 40 

<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 

<400> 60 

aaacatcttg aataaggtct tggaagtcca gtctcgtact 40 

15<210> 61 

<211> 40 

<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 61 

aacttcatca aacgcatcat tattctggat accgtcgaaa 40 

25 

<210> 62 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 62 

35acatccacgg ctgtgagagc ctccctaact tcatctctcg 40 

<210> 63 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 



<223> An oligonucleotide 



<400> 63 



Bttacagcgat ggtaatatcg ctaatttcaa gcccttgcat 



40 



<210> 64 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 64 

tttgatccag tcgagcaagt ggccgctatt ttgtgctcct 4 0 



<210> 65 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 65 

ccggcaccac tggtttgcct aaaggtgtca tgcagactca 40 

<210> 66 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 



<400> 66 



ccagaatatc tgtgtgcgtt tgatccacgc tctcgaccct 



40 



40<210> 67 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 67 

cgtgtgggta ctcaattgat ccctggcgtg actgtgctgg 

10 

<210> 68 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 68 

20tgtatctgcc tttctttcac gcctttggtt tctctattac 

<210> 69 

<211> 40 

<212> DNA 

25<213> Artificial Sequence 

<220> 

<22 3> An oligonucleotide 
30<400> 69 

cctgggctat ttcatggtcg gcttgcgtgt catcatgttt 

<210> 70 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 
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<400> 70 

cgtcgcttcg accaagaagc cttcttgaag gctattcaag 

<210> 71 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 71 

actacgaggt gcgttccgtg atcaacgtcc cttcagtcat 

15<210> 72 

<211> 43 

<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 72 

tttgttcctg agcaaatctc ctttggttga caagtatgat ctg 

25 

<210> 73 
<211> 37 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 73 

35agcagcttgc gtgagctgtg ctgtggcgct gctcctt 

<210> 74 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 



<22 3> An oligonucleotide 



<400> 74 



Stggccaaaga agtggccgag gtcgctgcta agcgtctgaa 



40 



<210> 75 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 75 

cctccctggt atccgctgcg gttttggttt gactgagagc 40 

<210> 76 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 76 

acttctgcta acatccatag cttgcgagac gagtttaagt 40 

<210> 77 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 



<400> 77 



ctggtagcct gggtcgcgtg actcctctta tggctgcaaa 



40 



40<210> 78 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 78 

gatcgccgac cgtgagaccg gcaaagcact gggcccaaat 4 0 

10 

<210> 79 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 79 

2 0caagtcggtg aattgtgtat taagggccct atggtctcta 4 0 

<210> 80 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 80 

aaggctacgt gaacaatgtg gaggccacta aagaagccat 4 0 

<210> 81 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 
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<400> 81 

tgatgatgat ggctggctcc atagcggcga cttcggttac 4 0 

<210> 82 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 82 

tatgatgagg acgaacactt ctatgtggtc gatcgctaca 4 0 

15<210> 83 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 83 

aagaattgat taagtacaaa ggctctcaag tcgcaccagc 40 

25 

<210> 84 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<22 3> An oligonucleotide 
<400> 84 

35cgaactggaa gaaattttgc tgaagaaccc ttgtatccgc 40 

<210> 85 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 85 

Sgacgtggccg tcgtgggtat cccagacttg gaagctggcg 

<210> 86 

<211> 40 

<212> DNA 

10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 86 

agttgcctag cgcctttgtg gtgaaacaac ccggcaagga 

<210> 87 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 87 

gatcactgct aaggaggtct acgactattt ggccgagcgc 

<210> 88 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 88 

gtgtctcaca ccaaatatct gcgtggcggc gtccgcttcg 
40<210> 89 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 89 

tcgattctat tccacgcaac gttaccggta agatcactcg 

10 

<210> 90 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 90 

20taaagagttg ctgaagcaac tcctcgaaaa agctggcggc 

<210> 91 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 91 

tagtaaagtc ttcatgatta tatagaaaaa aaagctagtg 

<210> 92 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 
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<400> 92 



taatcahgaa gactttacta gccgccagct ttttcgagga 



40 



<210> 93 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 93 

gttgcttcag caactcttta cgagtgatct taccggtaac 4 0 

15<210> 94 

<211> 39 

<212> DNA 

<213> Artificial Sequence 

20<220> 

<22 3> An oligonucleotide 

<400> 94 

gttgcgtgga atagaatcga cgaagcggac gccgccacg 39 

25 

<210> 95 
<211> 41 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 



<400> 95 



35cagatatttg gtgtgagaca cgcgctcggc caaatagtcg t 



41 



<210> 96 



<211> 40 



<212> DNA 



40<213> Artificial Sequence 
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<220> 



<223> Aa oligonucleotide 



<400> 96 



Sagacctcctt agcagtgatc tccttgccgg gttgtttcac 



40 



<210> 97 

<211> 40 

<212> DNA 

10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

15<400> 97 

cacaaaggcg ctaggcaact cgccagcttc caagtctggg 40 

<210> 98 

<211> 40 

20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 98 

atacccacga cggccacgtc gcggatacaa gggttcttca 40 

<210> 99 

30<211> 40 

<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 



<400> 99 



gcaaaatttc ttccagttcg gctggtgcga cttgagagcc 



40 



40<210> 100 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 100 

tttgtactta atcaattctt tgtagcgatc gaccacatag 40 

10 

<210> 101 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<22 3> An oligonucleotide 
<400> 101 

20aagtgttcgt cctcatcata gtaaccgaag tcgccgctat 40 

<210> 102 

<211> 40 

<212> DNA 

25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 102 

ggagccagcc atcatcatca atggcttctt tagtggcctc 40 

<210> 103 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 
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<400> 103 

cacattgttc acgtagcctt tagagaccat agggccctta 40 

<210> 104 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 104 

atacacaatt caccgacttg atttgggccc agtgctttgc 40 

15<210> 105 
<211> 40 
<212> DMA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 105 

cggtctcacg gtcggcgatc tttgcagcca taagaggagt 40 

25 

<210> 106 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 106 

3 5cacgcgaccc aggctaccag acttaaactc gtctcgcaag 40 

<210> 107 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 



<223> An oligonucleotide 



<400> 107 



Sctatggatgt tagcagaagt gctctcagtc aaaccaaaac 



40 



<210> 108 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 108 

cgcagcggat accagggagg ttcagacgct tagcagcgac 40 

<210> 109 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 109 

ctcggccact tctttggcca aaggagcagc gccacagcac 40 

<210> 110 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 



<400> 110 



agctcacgca agctgctcag atcatacttg tcaaccaaag 



40 



40<210> 111 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 111 

gagatttgct caggaacaaa atgactgaag ggacgttgat 

10 

<210> 112 
<211> 36 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 112 

20cacggaacgc acctcgtagt cttgaatagc cttcaa 

<210> 113 
<211> 44 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 113 

gaaggcttct tggtcgaagc gacgaaacat gatgacacgc aagc 

<210> 114 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 
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<400> 114 

cgaccatgaa atagcccagg gtaatagaga aaccaaaggc 

<210> 115 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 115 

gtgaaagaaa ggcagataca ccagcacagt cacgccaggg 

15<210> 116 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 116 

atcaattgag tacccacacg agggtcgaga gcgtggatca 

25 

<210> 117 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 117 

35aacgcacaca gatattctgg tgagtctgca tgacaccttt 

<210> 118 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 118 

Saggcaaacca gtggtgccgg aggagcacaa aatagcggcc 

<210> 119 

<211> 40 

<212> DNA 

10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 119 

acttgctcga ctggatcaaa atgcaagggc ttgaaattag 

<210> 120 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 120 

cgatattacc atcgctgtaa cgagagatga agttagggag 

<210> 121 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 121 

gctctcacag ccgtggatgt tttcgacggt atccagaata 
40<210> 122 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 122 

atgatgcgtt tgatgaagtt agtacgagac tggacttcca 

10 

<210> 123 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 123 

20agaccttatt caagatgttt ttggtagtaa agacgatttg 

<210> 124 
<211> 40 
<212> DNA 
- 25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 124 

aggtttgcta atacccataa ccttacacag ctcatctggg 

<210> 125 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 
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<400> 125 

atgtaagatt cattcacagg ggcgacaatc atgccgatgt 

<210> 126 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 126 

accaagcagc gattacagga ataaagaagc gagtgttatt 

15<210> 127 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 127 

ctcagcacaa atgctcacca catcgttcat tttgtaccca 

25 

<210> 128 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 128 

35caattatgga ggctttgggc caacagcaca gtagcttcga 

<210> 129 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 129 

Saaaattcttt gtaggagagg ctctcgtctc ccacgacgtc 40 

<210> 130 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 130 

cacgagtgct tgagggaggt gactatgttt acgcagtgct 40 

<210> 131 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 131 

cggaagagca tctcaccagc ggtgaggtct tccagtggat 40 

<210> 132 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 132 

gcagtggttc tgggccgtag atcacgttct tttcgcgttt 40 



40<210> 133 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 133 

catcatggga tcctgtttcc tgtgtgaaat tgttatccgc 

10 

<210> 134 
<211> 40 
<212> DMA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 134 

20ggaaacagga tcccatgatg aagcgtgaga aaaatgtcat 

<210> 135 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

30<400> 135 

ctatggccct gagcctctcc atcctttgga ggatttgact 

<210> 136 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
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<400> 136 

gccggcgaaa tgctgtttcg tgctctccgc aagcactctc 

<210> 137 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 137 

atttgcctca agccttggtc gatgtggtcg gcgatgaatc 

15<210> 138 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 138 

tttgagctac aaggagtttt ttgaggcaac cgtcttgctg 

25 

<210> 139 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 139 

35gctcagtccc tccacaattg tggctacaag atgaacgacg 

<210> 140 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 140 

Stcgttagtat ctgtgctgaa aacaataccc gtttcttcat 

<210> 141 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

15<400> 141 

tccagtcatc gccgcatggt atatcggtat gatcgtggct 

<210> 142 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 142 

ccagtcaacg agagctacat tcccgacgaa ctgtgtaaag 

<210> 143 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 143 

tcatgggtat ctctaagcca cagattgtct tcaccactaa 



40<210> 144 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 144 

gaatattctg aacaaagtcc tggaagtcca aagccgcacc 

10 

<210> 145 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 145 

20aactttatta agcgtatcat catcttggac actgtggaga 

<210> 146 

<211> 40 

<212> DNA 

25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 146 

atattcacgg ttgcgaatct ttgcctaatt tcatctctcg 

<210> 147 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
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<400> 147 

ctattcagac ggcaacatcg caaactttaa accactccac 

<210> 148 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 148 

ttcgaccctg tggaacaagt tgcagccatt ctgtgtagca 

15<210> 149 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 149 

gcggtactac tggactccca aagggagtca tgcagaccca 

25 

<210> 150 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 150 

35tcaaaacatt tgcgtgcgtc tgatccatgc tctcgatcca 

<210> 151 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 151 

Scgctacggca ctcagctgat tcctggtgtc accgtcttgg 

<210> 152 

<211> 40 

<212> DNA 

10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 152 

tctacttgcc tttcttccat gctttcggct ttcatattac 

<210> 153 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 153 

tttgggttac tttatggtcg gtctccgcgt gattatgttc 

<210> 154 

30<211> 40 

<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 

<400> 154 

cgccgttttg atcaggaggc tttcttgaaa gccatccaag 

40<210> 155 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 155 

attatgaagt ccgcagtgtc atcaacgtgc ctagcgtgat 

10 

<210> 156 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 156 

2 0cctgtttttg tctaagagcc cactcgtgga caagtacgac 

<210> 157 

<211> 40 

<212> DNA 

25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 157 

ttgtcttcac tgcgtgaatt gtgttgcggt gccgctccac 

<210> 158 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 
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<400> 158 

tggctaagga ggtcgctgaa gtggccgcca aacgcttgaa 4 0 

<210> 159 

5<211> 40 

<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 

<400> 159 

tcttccaggg attcgttgtg gcttcggcct caccgaatct 40 

15<210> 160 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 160 

accagcgcta ttattcagtc tctccgcgat gagtttaaga 40 

25 

<210> 161 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 161 

3 5gcggctcttt gggccgtgtc actccactca tggctgctaa 4 0 

<210> 162 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 162 

Sgatcgctgat cgcgaaactg gtaaggcttt gggccctaac 40 

<210> 163 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 163 

caagtgggcg agctgtgtat caaaggccct atggtgagca 4 0 

<210> 164 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 164 

agggttatgt caataacgtc gaagctacca aggaggccat 4 0 

<210> 165 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 165 

cgacgacgac ggctggttgc attctggtga ttttggatat 40 
40<210> 166 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<2 23> An oligonucleotide 
<400> 166 

tacgacgaag atgagcattt ttacgtcgtg gatcgttaca 

10 

<210> 167 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 167 

20aggagctgat caaatacaag ggtagccagg ttgctccagc 

<210> 168 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

30<400> 168 

tgagttggag gagattctgt tgaaaaatcc atgcattcgc 

<210> 169 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

4.0 
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<400> 169 

gatgtcgctg tggtcggcat tcctgatctg gaggccggcg 4 0 

<210> 170 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 170 

aactgccttc tgctttcgtt gtcaagcagc ctggtaaaga 40 

15<210> 171 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 171 

aattaccgcc aaagaagtgt atgattacct ggctgaacgt 4 0 

25 

<210> 172 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 172 

35gtgagccata ctaagtactt gcgtggcggc gtgcgttttg 40 

<210> 173 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 173 

Sttgactccat ccctcgtaac gtaacaggca aaattacccg 

<210> 174 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 174 

caaggagctg ttgaaacaat tgttggagaa ggccggcggt 

<210> 175 

<211> 40 

20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 175 

tagtaaagtc ttcatgatta tatagaaaaa aaagctagtg 

<210> 176 

30<211> 40 

<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 

<400> 176 

taatcatgaa gactttacta accgccggcc ttctccaaca 



40<210> 177 
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<211> 40 

<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 

<400> 177 

attgtttcaa cagctccttg cgggtaattt tgcctgttac 

10 

<210> 178 

<211> 40 

<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 

<400> 178 

2 0gttacgaggg atggagtcaa caaaacgcac gccgccacgc 

<210> 179 

<211> 40 

<212> DNA 

25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

30<400> 179 

aagtacttag tatggctcac acgttcagcc aggtaatcat 

<210> 180 

<211> 40 

35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 



WO 02/16944 



84 

<400> 180 

acacttcttt ggcggtaatt tctttaccag gctgcttgac 

<210> 181 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 181 

aacgaaagca gaaggcagtt cgccggcctc cagatcagga 

15<210> 182 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 182 

atgccgacca cagcgacatc gcgaatgcat ggatttttca 

25 

<210> 183 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 183 

35acagaatctc ctccaactca gctggagcaa cctggctacc 

<210> 184 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 



WO 02/16944 



85 

<220> 

<223> An oligonucleotide 
<400> 184 

Scttgtatttg atcagctcct tgtaacgatc cacgacgtaa 

<210> 185 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 185 

aaatgctcat cttcgtcgta atatccaaaa tcaccagaat 

<210> 186 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 186 

gcaaccagcc gtcgtcgtcg atggcctcct tggtagcttc 

<210> 187 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 187 

gacgttattg acataaccct tgctcaccat agggcctttg 
40<210> 188 
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<211> -40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 188 

atacacagct cgcccacttg gttagggccc aaagccttac 

10 

<210> 189 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 189 

20cagtttcgcg atcagcgatc ttagcagcca tgagtggagt 

<210> 190 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 190 

gacacggccc aaagagccgc tcttaaactc atcgcggaga 

<210> 191 
<211> 37 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 
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<400> 191 

gactgaataa tagcgctggt agattcggtg aggccga 37 

<210> 192 
5<211> 43 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 192 

agccacaacg aatccctgga agattcaagc gtttggcggc cac 43 

15<210> 193 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 193 

ttcagcgacc tccttagcca gtggagcggc accgcaacac 40 

25 

<210> 194 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 194 

35aattcacgca gtgaagacaa gtcgtacttg tccacgagtg 40 

<210> 195 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 195 

Sggctcttaga caaaaacagg atcacgctag gcacgttgat 

<210> 196 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

15<400> 196 

gacactgcgg acttcataat cttggatggc tttcaagaaa 

<210> 197 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 197 

gcctcctgat caaaacggcg gaacataatc acgcggagac 

<210> 198 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 198 

cgaccataaa gtaacccaaa gtaatatgaa agccgaaagc 



40<210> 199 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 199 

atggaagaaa ggcaagtaga ccaagacggt gacaccagga 4 0 

10 

<210> 200 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 200 

2 0atcagctgag tgccgtagcg tggatcgaga gcatggatca 40 

<210> 201 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 201 

gacgcacgca aatgttttga tgggtctgca tgactccctt 40 

<210> 202 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 
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<400> 202 

tgggagtcca gtagtaccgc tgctacacag aatggctgca 40 

<210> 203 

5<211> 40 

<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 

<400> 203 

acttgttcca cagggtcgaa gtggagtggt ttaaagtttg 4 0 

15<210> 204 
<211> 40 
<212> DMA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 204 

cgatgttgcc gtctgaatag cgagagatga aattaggcaa 40 

25 

<210> 205 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<22 3> An oligonucleotide 
<400> 205 

35agattcgcaa ccgtgaatat tctccacagt gtccaagatg 40 

<210> 206 

<211> 40 

<212> DNA 

40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 206 

Satgatacgct taataaagtt ggtgcggctt tggacttcca 

<210> 207 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 207 

ggactttgtt cagaatattc ttagtggtga agacaatctg 

<210> 208 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 208 

tggcttagag atacccatga ctttacacag ttcgtcggga 

<210> 209 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 209 

atgtagctct cgttgactgg agccacgatc ataccgatat 



40<210> 210 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 210 

accatgcggc gatgactgga atgaagaaac gggtattgtt 4 0 

10 

<210> 211 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 211 

20ttcagcacag atactaacga cgtcgttcat cttgtagcca 40 

<210> 212 

<211> 40 

<212> DNA 

25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 212 

caattgtgga gggactgagc cagcaagacg gttgcctcaa 40 

<210> 213 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 
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<400> 213 

aaaactcctt gtagctcaaa gattcatcgc cgaccacatc 

<210> 214 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 214 

gaccaaggct tgaggcaaat gagagtgctt gcggagagca 

15<210> 215 

<211> 40 

<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 215 

cgaaacagca tttcgccggc agtcaaatcc tccaaaggat 

25 

<210> 216 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 216 

35ggagaggctc agggccatag atgacatttt tctcacgctt 

i 

<210> 217 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 217 

Scatcatggga tcctgtttcc tgtgtgaaat tgttatccgc 

<210> 218 
<211> 542 
<212> PRT 
10<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic luciferase 



15<400> 218 

Met Met Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
20 25 30 

2 0Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 
2565 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe lie Pro Val lie Ala Ala Trp Tyr 

85 90 95 

lie Gly Met lie Val Ala Pro Val Asn Glu Ser Tyr lie Pro Asp Glu 
100 105 110 

30Leu Cys Lys Val Met Gly lie Ser Lys Pro Gin lie Val Phe Thr Thr 
115 120 125 

Lys Asn lie Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 

130 135 140 

lie Lys Arg lie lie lie Leu Asp Thr Val Glu Asn lie His Gly Cys 
35145 150 155 160 

Glu Ser Leu Pro Asn Phe lie Ser Arg Tyr Ser Asp Gly Asn lie Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala lie 
180 185 190 

40Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
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His Gin Asn He Cys 
210 

Gly Thr Gin Leu He 
5225 

Phe His Ala Phe Gly 
245 

Leu Arg Val He Met 
260 

lOAla He Gin Asp Tyr 
275 

He Leu Phe Leu Ser 
290 

Ser Leu Arg Glu Leu 
15305 

Ala Glu Val Ala Ala 
325 

Phe Gly Leu Thr Glu 
340 

20Glu Phe Lys Ser Gly 
355 

Lys He Ala Asp Arg 
370 

Gly Glu Leu Cys He 
25385 

Asn Val Glu Ala Thr 
405 

Ser Gly Asp Phe Gly 
420 

3 0Asp Arg Tyr Lys Glu 
435 

Ala Glu Leu Glu Glu 
450 

Ala Val Val Gly He 
35465 

Phe Val Val Lys Gin 
485 

Asp Tyr Leu Ala Glu 
500 

40Val Arg Phe Val Asp 



95 

200 

Val Arg Leu He His Ala 
215 

Pro Gly Val Thr Val Leu 
230 235 
Phe His He Thr Leu Gly 
250 

Phe Arg Arg Phe Asp Gin 
265 

Glu Val Arg Ser Val He 
280 

Lys Ser Pro Leu Val Asp 
295 

Cys Cys Gly Ala Ala Pro 
310 315 
Lys Arg Leu Asn Leu Pro 
330 

Ser Thr Ser Ala He He 
345 

Ser Leu Gly Arg Val Thr 
360 

Glu Thr Gly Lys Ala Leu 
375 

Lys Gly Pro Met Val Ser 
390 395 
Lys Glu Ala He Asp Asp 
410 

Tyr Tyr Asp Glu Asp Glu 
425 

Leu He Lys Tyr Lys Gly 
440 

He Leu Leu Lys Asn Pro 
455 

Pro Asp Leu Glu Ala Gly 
470 475 
Pro Gly Lys Glu He Thr 
490 

Arg Val Ser His Thr Lys 
505 

Ser He Pro Arg Asn Val 



205 

Leu Asp Pro Arg Tyr 
220 

Val Tyr Leu Pro Phe 
240 

Tyr Phe Met Val Gly 
255 

Glu Ala Phe Leu Lys 
270 

Asn Val Pro Ser Val 
285 

Lys Tyr Asp Leu Ser 
300 

Leu Ala Lys Glu Val 
320 

Gly He Arg Cys Gly 
335 

Gin Ser Leu Arg Asp 
350 

Pro Leu Met Ala Ala 
365 

Gly Pro Asn Gin Val 
380 

Lys Gly Tyr Val Asn 
400 

Asp Gly Trp Leu His 
415 

His Phe Tyr Val Val 
430 

Ser Gin Val Ala Pro 
445 

Cys He Arg Asp Val 
460 

Glu Leu Pro Ser Ala 
480 

Ala Lys Glu Val Tyr 
495 

Tyr Leu Arg Gly Gly 
510 

Thr Gly Lys He Thr 
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515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

5<210> 219 
<211> 542 
<212> PRT 

<213> Artificial Sequence 
10<220> 

<223> Sequence of a synthetic lucif erase 
<400> 219 

Met Met Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His 
15 1 5 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
35 40 45 

20Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe lie Pro Val lie Ala Ala Trp Tyr 
25 85 90 95 

lie Gly Met lie Val Ala Pro Val Asn Glu Ser Tyr lie Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly lie Ser Lys Pro Gin lie Val Phe Thr Thr 
115 120 125 

30Lys Asn lie Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
130 135 140 

lie Lys Arg lie He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 
35 165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
195 200 205 

40His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 



WO 02/16944 



PCT/US01/26566 



210 

Gly Thr Gin Leu 
225 

Phe His Ala Phe 

5 

Leu Arg Val lie 
260 

Ala lie Gin Asp 
275 

10 lie Leu Phe Leu 
290 

Ser Leu Arg Glu 
305 

Ala Glu Val Ala 

15 

Phe Gly Leu Thr 
340 

Glu Phe Lys Ser 
355 

2 0Lys lie Ala Asp 

370 

Gly Glu Leu Cys 
385 

Asn Val Glu Ala 

25 

Ser Gly Asp Phe 
420 

Asp Arg Tyr Lys 
435 

3 0Ala Glu Leu Glu 

450 

Ala Val Val Gly 
465 

Phe Val Val Lys 

35 

Asp Tyr Leu Ala 
500 

Val Arg Phe Val 
515 

4 0 Arg Lys Glu Leu 



215 

He Pro Gly Val 
230 

Gly Phe His He 
245 

Met Phe Arg Arg 

Tyr Glu Val Arg 
280 

Ser Lys Ser Pro 
295 

Leu Cys Cys Gly 
310 

Ala Lys Arg Leu 
325 

Glu Ser Thr Ser 

Gly Ser Leu Gly 
360 

Arg Glu Thr Gly 
375 

He Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu He Lys 
440 

Glu He Leu Leu 
455 

He Pro Asp Leu 
470 

Gin Pro Gly Lys 
485 

Glu Arg Val ser 

Asp Ser He Pro 
520 

Leu Lys Gin Leu 
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220 

Thr Val Leu Val 
235 

Thr Leu Gly Tyr 
250 

Phe Asp Gin Glu 
265 

Ser Val He Asn 

Leu Val Asp Lys 
300 

Ala Ala Pro Leu 
315 

Asn Leu Pro Gly 
330 

Ala He He Gin 
345 

Arg Val Thr Pro 

Lys Ala Leu Gly 
380 

Met Val Ser Lys 
395 

lie Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu He Thr Ala 
490 

His Thr Lys Tyr 
505 

Arg Asn Val Thr 
Leu Glu Lys Ala 



Tyr Leu Pro Phe 
240 

Phe Met Val Gly 
255 

Ala Phe Leu Lys 
270 

Val Pro Ser Val 
285 

Tyr Asp Leu Ser 

Ala Lys Glu Val 
320 

He Arg Cys Gly 
335 

Ser Leu Arg Asp 
350 

Leu Met Ala Ala 
365 

Pro Asn Gin Val 

Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 
510 

Gly Lys He Thr 
525 

Gly Gly 
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530 535 540 

<210> 220 
<211> 542 
5<212> PRT 
<213> Artificial Sequence 

<220> 

<223> Sequence of a synthetic luciferase 

10 

<400> 220 

Met Met Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His 

15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
15 20 25 30 

Lys His Ser Tyr Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
50 55 60 

2 0Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe lie Pro Val lie Ala Ala Trp Tyr 

85 90 95 

lie Gly Met lie Val Ala Pro Val Asn Glu Ser Tyr lie Pro Asp Glu 
25 100 105 110 

Leu Cys Lys Val Met Gly lie Ser Lys Pro Gin lie Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
130 135 140 

30Ile Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala lie 
35 180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn lie Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 
210 215 220 

4 0Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
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225 230 235 240 

Phe His Ala Phe Gly Phe His He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
5 260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 
290 295 300 

lOSer Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala He He Gin Ser Leu Arg Asp 
15 340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys lie Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
370 375 380 

2 0Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 

385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
25 420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 
450 455 460 

3 0Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 

465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu lie Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 
35 500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

40 
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<210> 221 
<211> 542 
<212> PRT 

<213> Artificial Sequence 

5 

<220> 

<223> Sequence of a synthetic lucif erase 



<400> 221 

lOMet Met Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His 
15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
15 35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 
65 70 75 80 

2 0Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 
25 115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 

130 135 140 

lie Lys Arg He He He Leu Asp Thr Val Glu Asn lie His Gly Cys 
145 150 155 160 

3 0Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala lie 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
35 195 200 205 

His Gin Asn He Cys Val Arg Leu lie His Ala Leu Asp Pro Arg Tyr 

210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

40Phe His Ala Phe Gly Phe His He Thr Leu Gly Tyr Phe Met Val Gly 
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245 



250 



255 



Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

i 275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 



10 Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 
325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala He He Gin Ser Leu Arg Asp 

340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
15 355 360 365 

Lys He Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 

Gly Glu Leu Cys He Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

20Asn Val Glu Ala Thr Lys Glu Ala He Asp Asp Asp Gly Trp Leu His 
405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 

420 425 430 

Asp Arg Tyr Lys Glu Leu lie Lys Tyr Lys Gly Ser Gin Val Ala Pro 
25 435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 

450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

3 0Phe Val Val Lys Gin Pro Gly Lys Glu lie Thr Ala Lys Glu Val Tyr 
485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser lie Pro Arg Asn Val Thr Gly Lys He Thr 
35 515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 



305 



310 



315 



320 



<210> 222 
40<211> 542 
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<212> PRT 

<213> Artificial Sequence 
<220> 

5<223> Sequence of a synthetic luciferase 
<400> 222 

Met Met Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 
15 10 15 

10 Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
15 50 55 60. 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 
85 90 95 

20Ile Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
25 130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 
165 170 175 

3 0Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 
180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 
35 210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe His He Thr Leu Gly Tyr Phe Met Val Gly 
245 250 255 

40Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
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Ala lie Gin Asp Tyr 
275 

lie Leu Phe Leu Ser 
5 290 
Ser Leu Arg Glu Leu 
305 

Ala Glu Val Ala Ala 
325 

lOPhe Gly Leu Thr Glu 
340 

Glu Phe Lys Ser Gly 
355 

Lys lie Ala Asp Arg 
15 370 

Gly Glu Leu Cys He 
385 

Asn Val Glu Ala Thr 
405 

2 0Ser Gly Asp Phe Gly 

420 

Asp Arg Tyr Lys Glu 
435 

Ala Glu Leu Glu Glu 
25 450 

Ala Val Val Gly He 
465 

Phe Val Val Lys Gin 
485 

3 0Asp Tyr Leu Ala Glu 

500 

Val Arg Phe Val Asp 
515 

Arg Lys Glu Leu Leu 
35 530 



103 

265 

Glu Val Arg Ser Val 
280 

Lys Ser Pro Leu Val 
295 

Cys Cys Gly Ala Ala 
310 

Lys Arg Leu Asn Leu 
330 

Ser Thr Ser Ala He 
345 

Ser Leu Gly Arg Val 
360 

Glu Thr Gly Lys Ala 
375 

Lys Gly Pro Met Val 
390 

Lys Glu Ala He Asp 
410 

Tyr Tyr Asp Glu Asp 
425 

Leu He Lys Tyr Lys 
440 

He Leu Leu Lys Asn 
455 

Pro Asp Leu Glu Ala 
470 

Pro Gly Lys Glu He 
490 

Arg Val Ser His Thr 
505 

Ser He Pro Arg Asn 
520 

Lys Gin Leu Leu Glu 
535 



270 

He Asn Val Pro Ser Val 
285 

Asp Lys Tyr Asp Leu Ser 
300 

Pro Leu Ala Lys Glu Val 
315 320 
Pro Gly , He Arg Cys Gly 
335 

He Gin Ser Leu Gly Asp 
350 

Thr Pro Leu Met Ala Ala 
365 

Leu Gly Pro Asn Gin Val 
380 

Ser Lys Gly Tyr Val Asn 
395 400 
Asp Asp Gly Trp Leu His 
415 

Glu His Phe Tyr Val Val 
430 

Gly Ser Gin Val Ala Pro 
445 

Pro Cys He Arg Asp Val 
460 

Gly Glu Leu Pro Ser Ala 
475 480 
Thr Ala Lys Glu Val Tyr 
495 

Lys Tyr Leu Arg Gly Gly 
510 

Val Thr Gly Lys He Thr 
525 

Lys Ala Gly Gly 
540 



<210> 223 
<211> 542 
<212> PRT 
40<213> Artificial Sequence 
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<220> 

<223> Sequence of a synthetic lucif erase 
<400> 223 

5Met lie Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His 
15 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
10 35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 

50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 
65 70 75 80 

ISAla Glu Asn Asn Thr Arg Phe Phe lie Pro Val lie Ala Ala Trp Tyr 
85 90 95 

lie Gly Met lie Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 

100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 
20 115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 

130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

25Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 
165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 

180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 
30 195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 

210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

35Phe His Ala Phe Gly Phe His He Thr Leu Gly Tyr Phe Met Val Gly 
245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 

260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 
40 275 280 285 
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lie Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 

290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

5Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly lie Arg Cys Gly 
325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala lie lie Gin Thr Leu Gly Asp 

340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 
10 355 360 365 

Lys lie Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 

370 375 380 

Gly Glu Leu Cys lie Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

15Asn Val Glu Ala Thr Lys Glu Ala lie Asp Asp Asp Gly Trp Leu His 
405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 

420 425 430 

Asp Arg Tyr Lys Glu Leu lie Lys Tyr Lys Gly Ser Gin Val Ala Pro 
20 435 440 445 

Ala Glu Leu Glu Glu lie Leu Leu Lys Asn Pro Cys He Arg Asp Val 

450 455 460 

Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

25Phe Val Val Lys Gin Pro Gly Thr Glu He Thr Ala Lys Glu Val Tyr 
485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 

500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys He Thr 
30 515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Val Lys Ala Gly Gly 
530 535 540 



<210> 224 
35<211> 311 
<212> PRT 

<213> Renilla reniformis 



<400> 224 

4 0Met Thr Ser Lys Val Tyr Asp Pro Glu Gin Arg Lys Arg Met He Thr 
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Gly Pro Gin 

Phe lie Asn 
5 35 
Phe Leu His 
50 

Pro His lie 
65 

lOMet Gly Lys 

His Tyr Lys 

Lys lie lie 
15 115 
Tyr Ser Tyr 
130 

Ser Val Val 
145 

2 0Glu Asp He 

Glu Asn Asn 

Lys Leu Glu 
25 195 
Lys Gly Glu 
210 

Leu Val Lys 
225 

3 0Asn Ala Tyr 

Ser Asp Pro 

Phe Pro Asn 
35 275 
Glu Asp Ala 
290 

Arg Val Leu 
305 

40 



5 

Trp Trp Ala 
20 

Tyr Tyr Asp 

Gly Asn Ala 

Glu Pro Val 
70 

Ser Gly Lys 
85 

Tyr Leu Thr 
100 

Phe Val Gly 

Glu His Gin 

Asp Val He 
150 

Ala Leu He 
165 

Phe Phe Val 
180 

Pro Glu Glu 

Val Arg Arg 

Gly Gly Lys 
230 

Leu Arg Ala 
245 

Gly Phe Phe 
260 

Thr Glu Phe 

Pro Asp Glu 

Lys Asn Glu 
310 



Arg Cys Lys 
25 

Ser Glu Lys 
40 

Ala Ser Ser 
55 

Ala Arg Cys 

Ser Gly Asn 

Ala Trp Phe 
105 

His Asp Trp 
120 

Asp Lys He 
135 

Glu Ser Trp 

Lys Ser Glu 

Glu Thr Met 
185 

Phe Ala Ala 
200 

Pro Thr Leu 
215 

Pro Asp Val 

Ser Asp Asp 

Ser Asn Ala 
265 

Val Lys Val 

280 
Met Gly Lys 
295 
Gin 



106 
10 

Gin Met Asn 

His Ala Glu 

Tyr Leu Trp 
60 

He He Pro 
75 

Gly Ser Tyr 
90 

Glu Leu Leu 

Gly Ala Cys 

Lys Ala He 
140 

Asp Glu Trp 
155 

Glu Gly Glu 
170 

Leu Pro Ser 

Tyr Leu Glu 

Ser Trp Pro 
220 

Val Gin He 

235 
Leu Pro Lys 
250 

He Val Glu 

Lys Gly Leu 

Tyr He Lys 
300 
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15 

Val Leu Asp Ser 
30 

Asn Ala Val He 
45 

Arg His Val Val 

Asp Leu He Gly 
80 

Arg Leu Leu Asp 
95 

Asn Leu Pro Lys 
110 

Leu Ala Phe His 
125 

Val His Ala Glu 

Pro Asp He Glu 
160 

Lys Met Val Leu 
175 

Lys He Met Arg 
190 

Pro Phe Lys Glu 
205 

Arg Glu He Pro 

Val Arg Asn Tyr 
240 

Met Phe He Glu 
255 

Gly Ala Lys Lys 
270 

His Phe Ser Gin 
285 

Ser Phe Val Glu 
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<210> 225 
<211> 311 
<212> PRT 

<213> Artificial Sequence 

5 

<220> 

<223> Sequence of a synthetic luciferase 



<400> 225 

lOMet Ala Ser Lys Val Tyr Asp Pro Glu Gin Arg Lys Arg Met lie Thr 
15 10 15 

Gly Pro Gin Trp Trp Ala Arg Cys Lys Gin Met Asn Val Leu Asp Ser 

20 25 30 

Phe lie Asn Tyr Tyr Asp Ser Glu Lys His Ala Glu Asn Ala Val lie 
15 35 40 45 

Phe Leu His Gly Asn Ala Ala Ser Ser Tyr Leu Trp Arg His Val Val 

50 55 60 

Pro His lie Glu Pro Val Ala Arg Cys lie lie Pro Asp Leu lie Gly 
65 70 75 80 

2 0Met Gly Lys Ser Gly Lys Ser Gly Asn Gly Ser Tyr Arg Leu Leu Asp 
85 90 95 

His Tyr Lys Tyr Leu Thr Ala Trp Phe Glu Leu Leu Asn Leu Pro Lys 

100 105 110 

Lys He He Phe Val Gly His Asp Trp Gly Ala Cys Leu Ala Phe His 
25 115 120 125 

Tyr Ser Tyr Glu His Gin Asp Lys He Lys Ala He Val His Ala Glu 

130 135 140 

Ser Val Val Asp Val He Glu Ser Trp Asp Glu Trp Pro Asp He Glu 
145 150 155 160 

30Glu Asp He Ala Leu He Lys Ser Glu Glu Gly Glu Lys Met Val Leu 
165 170 175 

Glu Asn Asn Phe Phe Val Glu Thr Met Leu Pro Ser Lys He Met Arg 

180 185 190 

Lys Leu Glu Pro Glu Glu Phe Ala Ala Tyr Leu Glu Pro Phe Lys Glu 
35 195 200 205 

Lys Gly Glu Val Arg Arg Pro Thr Leu Ser Trp Pro Arg Glu He Pro 

210 215 220 

Leu Val Lys Gly Gly Lys Pro Asp Val Val Gin He Val Arg Asn Tyr 
225 230 235 240 

40Asn Ala Tyr Leu Arg Ala Ser Asp Asp Leu Pro Lys Met Phe He Glu 
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245 250 255 

Ser Asp Pro Gly Phe Phe Ser Asn Ala He Val Glu Gly Ala Lys Lys 

260 265 270 

Phe Pro Asn Thr Glu Phe Val Lys Val Lys Gly Leu His Phe Ser Gin 
5 275 280 285 

Glu Asp Ala Pro Asp Glu Met Gly Lys Tyr He Lys Ser Phe Val Glu 

290 295 300 

Arg Val Leu Lys Asn Glu Gin 
305 310 

10 

<210> 226 
<211> 311 
<212> PRT 

<213> Artificial Sequence 

15 

<220> 

<22 3> Sequence of a synthetic lucif erase 
<400> 226 

20Met Ala Ser Lys Val Tyr Asp Pro Glu Gin Arg Lys Arg Met He Thr 
15 10 15 

Gly Pro Gin Trp Trp Ala Arg Cys Lys Gin Met Asn Val Leu Asp Ser 

20 25 30 

Phe He Asn Tyr Tyr Asp Ser Glu Lys His Ala Glu Asn Ala Val He 
25 35 40 45 

Phe Leu His Gly Asn Ala Ala Ser Ser Tyr Leu Trp Arg His Val Val 

50 55 60 

Pro His He Glu Pro Val Ala Arg Cys He He Pro Asp Leu He Gly 
65 70 75 80 

30Met Gly Lys Ser Gly Lys Ser Gly Asn Gly Ser Tyr Arg Leu Leu Asp 
85 90 95 

His Tyr Lys Tyr Leu Thr Ala Trp Phe Glu Leu Leu Asn Leu Pro Lys 

100 105 HO 

Lys He He Phe Val Gly His Asp Trp Gly Ala Cys Leu Ala Phe His 
35 H5 120 125 

Tyr Ser Tyr Glu His Gin Asp Lys He Lys Ala He Val His Ala Glu 

130 135 140 

Ser Val Val Asp Val He Glu Ser Trp Asp Glu Trp Pro Asp He Glu 
145 150 155 160 

40Glu Asp He Ala Leu He Lys Ser Glu Glu Gly Glu Lys Met Val Leu 
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165 170 175 

Glu Asn Asn Phe Phe Val Glu Thr Met Leu Pro Ser Lys lie Met Arg 

180 185 190 

Lys Leu Glu Pro Glu Glu Phe Ala Ala Tyr Leu Glu Pro Phe Lys Glu 
5 195 200 205 

Lys Gly Glu Val Arg Arg Pro Thr Leu Ser Trp Pro Arg Glu lie Pro 

210 215 220 

Leu Val Lys Gly Gly Lys Pro Asp Val Val Gin lie Val Arg Asn Tyr 
225 230 235 240 

lOAsn Ala Tyr Leu Arg Ala Ser Asp Asp Leu Pro Lys Met Phe He Glu 
245 250 255 

Ser Asp Pro Gly Phe Phe Ser Asn Ala He Val Glu Gly Ala Lys Lys 

260 265 270 

Phe Pro Asn Thr Glu Phe Val Lys Val Lys Gly Leu His Phe Ser Gin 
15 275 280 285 

Glu Asp Ala Pro Asp Glu Met Gly Lys Tyr He Lys Ser Phe Val Glu 

290 295 300 

Arg Val Leu Lys Asn Glu Gin 
305 310 

20 

<210> 227 
<211> 311 
<212> PRT 

<213> Artificial Sequence 

25 

<220> 

<223> Sequence of a synthetic luciferase 



<400> 227 
3 0Met Ala Ser Lys Val 
1 5 
Gly Pro Gin Trp Trp 
20 

Phe He Asn Tyr Tyr 
35 35 

Phe Leu His Gly Asn 
50 

Pro His He Glu Pro 
65 

40Met Gly Lys Ser Gly 



Tyr Asp Pro Glu Gin 
10 

Ala Arg Cys Lys Gin 
25 

Asp Ser Glu Lys His 
40 

Ala Ala Ser Ser Tyr 
55 

Val Ala Arg Cys He 
70 

Lys Ser Gly Asn Gly 



Arg Lys Arg Met He Thr 
15 

Met Asn Val Leu Asp Ser 
30 

Ala Glu Asn Ala Val He 
45 

Leu Trp Arg His Val Val 
60 

He Pro Asp Leu He Gly 
75 80 
Ser Tyr Arg Leu Leu Asp 
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85 90 95 

His Tyr Lys Tyr Leu Thr Ala Trp Phe Glu Leu Leu Asn Leu Pro Lys 

100 105 110 

Lys lie lie Phe Val Gly His Asp Trp Gly Ala Cys Leu Ala Phe His 
5 115 120 125 

Tyr Ser Tyr Glu His Gin Asp Lys lie Lys Ala He Val His Ala Glu 

130 135 140 

Ser Val Val Asp Val He Glu Ser Trp Asp Glu Trp Pro Asp He Glu 
145 150 155 160 

lOGlu Asp He Ala Leu He Lys Ser Glu Glu Gly Glu Lys Met Val Leu 
165 170 175 

Glu Asn Asn Phe Phe Val Glu Thr Met Leu Pro Ser Lys He Met Arg 

180 185 190 

Lys Leu Glu Pro Glu Glu Phe Ala Ala Tyr Leu Glu Pro Phe Lys Glu 
15 195 200 205 

Lys Gly Glu Val Arg Arg Pro Thr Leu Ser Trp Pro Arg Glu He Pro 

210 215 220 

Leu Val Lys Gly Gly Lys Pro Asp Val Val Gin lie Val Arg Asn Tyr 
225 230 235 240 

20Asn Ala Tyr Leu Arg Ala Ser Asp Asp Leu Pro Lys Met Phe He Glu 
245 250 255 

Ser Asp Pro Gly Phe Phe Ser Asn Ala He Val Glu Gly Ala Lys Lys 

260 265 270 

Phe Pro Asn Thr Glu Phe Val Lys Val Lys Gly Leu His Phe Ser Gin 
25 275 280 285 

Glu Asp Ala Pro Asp Glu Met Gly Lys Tyr He Lys Ser Phe Val Glu 

290 295 300 

Arg Val Leu Lys Asn Glu Gin 
305 310 

30 

<210> 228 
<211> 14 
<212> DNA 

<213> Artificial Sequence 

35 

<220> 

<223> A consensus sequence 

<221> mis cofeature 
40<222> (1) . . . (14) 
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<223> n = A,T,C or G 

<400> 228 
yggmnnnnng ccaa 

5 

<210> 229 
<211> 38 
<212> DNA 

<213> Artificial Sequence 

10 

<220> 

<223> A primer 
<400> 229 

lSgtactgagac gacgccagcc caagcttagg cctgagtg 

<210> 230 

<211> 38 

<212> DNA 

20<213> Artificial Sequence 

<220> 

<223> A primer 

25<400> 230 

ggcatgagcg tgaactgact gaactagcgg ccgccgag 

<210> 231 
<211> 24 
30<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A primer 

35 

<400> 231 

ggatcccatg gtgaagcgtg agaa 

<210> 232 
40<211> 21 
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<212> DNA 

<213> Artificial Sequence 

<220> 
5<223> A primer 

<400> 232 

ggatcccatg gtgaaacgcg a 21 

10<210> 233 
<211> 31 
<212> DNA 

<213> Artificial Sequence 

15<220> 

<223> A primer 

<400> 233 

ctagcttttt tttctagata atcatgaaga c 31 

20 

<210> 234 
<211> 54 
<212> DNA 

<213> Artificial Sequence 

25 

<220> 

<223> A primer 
<400> 234 

30caaaaagctt ggcattccgg tactgttggt aaagccacca tggtgaagcg agag 54 

<210> 235 
<211> 26 
<212> DNA 
35<213> Artificial Sequence 

<220> 

<223> A primer 
40<400> 235 
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caattgttgt tgttaacttg tttatt 

<210> 236 

<211> 40 

5<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A primer 

10 

<400> 236 

aaccatggct tccaaggtgt acgaccccga gcaacgcaaa 

<210> 237 
15<211> 40 
<212> DNA 

<213> Artificial Sequence 

<220> 
20<223> A primer 

<400> 237 

gctctagaat tactgctcgt tcttcagcac gcgctccacg 

25<210> 238 
<211> 31 
<212> DNA 

<213> Artificial Sequence 

30<220> 

<223> A primer 

<400> 238 

cgctagccat ggcttcgaaa gtttatgatc c 

35 

<210> 239 
<211> 25 
<212> DNA 

<213> Artificial Sequence 

40 
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<220> 

<223> A primer 

<400> 239 
Sggccagtaac tctagaatta ttgtt 

<210> 240 
<211> 5 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

15<400> 240 
tataa 

<210> 241 

<211> 6 

20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 241 
stratg 

<210> 242 
30<211> 9 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 

<221> misc_feature 
<222> (1) . . . (9) 
<223> n = A,T,C or G 

40 
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<400> 242 
mttncnnma 

<210> 243 
5<211> 5 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 

<400> 243 
tratg 

15<210> 244 
<211> 7 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> A consensus sequence 

<400> 244 
tgastma 

25 

<210> 245 
<211> 14 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> A consensus sequence 

<221> misc__f eature 
35<222> (1) . . . (14) 

<223> n = A,T,C or G 

<400> 245 
yggmnnnnng ccaa 

40 
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<210> 246 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

5 

<220> 

<223> An oligonucleotide 
<400> 246 

lOaaccatggct tccaaggtgt acgaccccga gcaacgcaaa 

<210> 247 
<211> 40 
<212> DNA 
15<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
20<400> 247 

cgcatgatca ctgggcctca gtggtgggct cgctgcaagc 

<210> 248 

<211> 40 

25<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

30 

<400> 248 

aaatgaacgt gctggactcc ttcatcaact actatgattc 

<210> 249 
35<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

40<223> An oligonucleotide 
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<400> 249 

cgagaagcac gccgagaacg ccgtgatttt tctgcatggt aacgctgcct 5 0 

<210> 250 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 250 

ccagctacct gtggaggcac gtcgtgcctc acatcgagcc 4 0 

15<210> 251 

<211> 40 

<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 251 

cgtggctaga tgcatcatcc ctgatctgat cggaatgggt 4 0 

25 

<210> 252 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 252 

35aagtccggca agagcgggaa tggctcatat cgcctcctgg 4 0 

<210> 253 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 253 

Satcactacaa gtacctcacc gcttggttcg agctgctgaa 

<210> 254 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 254 

ccttccaaag aaaatcatct ttgtgggcca cgactggggg 

<210> 255 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 255 

gcttgtctgg cctttcacta ctcctacgag caccaagaca 

<210> 256 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 256 

agatcaaggc catcgtccat gctgagagtg tcgtggacgt 
40<210> 257 
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<211> 45 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 257 

gatcgagtcc tgggacgagt ggcctgacat cgaggaggat atcgc 

10 

<210> 258 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 258 

2 0cctgatcaag agcgaagagg gcgagaaaat ggtgcttgag 

<210> 259 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

30<400> 259 

aataacttct tcgtcgagac catgctccca agcaagatca 

<210> 260 
<211> 45 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
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<400> 260 

tgcggaaact ggagcctgag gagttcgctg cctacctgga gccat 45 

<210> 261 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 261 

tcaaggagaa gggcgaggtt agacggccta ccctctcctg 40 

15<210> 262 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 262 

gcctcgcgag atccctctcg ttaagggagg caagcccgac 4 0 

25 

<210> 263 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 263 

35gtcgtccaga ttgtccgcaa ctacaacgcc taccttcggg 40 

<210> 264 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 264 

Sccagcgacga tctgcctaag atgttcatcg agtccgaccc 

<210> 265 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

f 

<223> An oligonucleotide 
15<400> 265 

tgggttcttt tccaacgcta ttgtcgaggg agctaagaag 

<210> 266 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 266 

ttccctaaca ccgagttcgt gaaggtgaag ggcctccact 

<210> 267 
30<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 267 

tcagccagga ggacgctcca gatgaaatgg gtaagtacat 



40<210> 268 
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<211> 49 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 268 

caagagcttc gtggagcgcg tgctgaagaa cgagcagtaa ttctagagc 

10 

<210> 269 
<211> 29 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 

<400> 269 
20gctctagaat tactgctcgt tcttcagca 

<210> 270 
<211> 40 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

30<400> 270 

cgcgctccac gaagctcttg atgtacttac ccatttcatc 

i 

<210> 271 
<211> 40 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

40 
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<400> 271 

tggagcgtcc tcctggctga agtggaggcc cttcaccttc 

<210> 272 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 272 

acgaactcgg tgttagggaa cttcttagct ccctcgacaa 

15<210> 273 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 273 

tagcgttgga aaagaaccca gggtcggact cgatgaacat 

25 

<210> 274 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 274 

35cttaggcaga tcgtcgctgg cccgaaggta ggcgttgtag 

<210> 275 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 275 

Sttgcggacaa tctggacgac gtcgggcttg cctcccttaa 

<210> 276 
<211> 40 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 276 

cgagagggat ctcgcgaggc caggagaggg taggccgtct 

<210> 277 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

25 

<400> 277 

aacctcgccc ttctccttga atggctccag gtaggcagcg 

<210> 278 

30<211> 45 

<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 

<400> 278 

aactcctcag gctccagttt ccgcatgatc ttgcttggga gcatg 

40<210> 279 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 279 

gtctcgacga agaagttatt ctcaagcacc attttctcgc 

10 

<210> 280 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 280 

2 0cctcttcgct cttgatcagg gcgatatcct cctcgatgtc 

<210> 281 

<211> 43 

<212> DNA 

25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
30<400> 281 

aggccactcg tcccaggact cgatcacgtc cacgacactc tea 

<210> 282 

<211> 42 

35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
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<400> 282 

gcatggacga tggccttgat cttgtcttgg tgctcgtagg ag 

<210> 283 
5<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 
<400> 283 

tagtgaaagg ccagacaagc cccccagtcg tggcccacaa 

15<210> 284 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> An oligonucleotide 

<400> 284 

agatgatttt ctttggaagg ttcagcagct cgaaccaagc 

25 

<210> 285 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> An oligonucleotide 
<400> 285 

35ggtgaggtac ttgtagtgat ccaggaggcg atatgagcca 

<210> 286 
<211> 40 
<212> DNA 
40<213> Artificial Sequence 
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<220> 

<223> An oligonucleotide 
<400> 286 

Sttcccgctct tgccggactt acccattccg atcagatcag 

<210> 287 
<211> 45 
<212> DNA 
10<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 
15<400> 287 

ggatgatgca tctagccacg ggctcgatgt gaggcacgac gtgcc 

<210> 288 
<211> 40 
20<212> DNA 

<213> Artificial Sequence 

<220> 

<22 3> An oligonucleotide 

25 

<400> 288 

tccacaggta gctggaggca gcgttaccat gcagaaaaat 

<210> 289 
30<211> 45 
<212> DNA 

<213> Artificial Sequence 
<220> 

35<223> An oligonucleotide 
<400> 289 

cacggcgttc tcggcgtgct tctcggaatc atagtagttg atgaa 



40<210> 290 
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<211> 40 
<212> DNA 

<213> Artificial Sequence 
5<220> 

<223> An oligonucleotide 
<400> 290 

ggagtccagc acgttcattt gcttgcagcg agcccaccac 

10 

<210> 291 
<211> 40 
<212> DNA 

<213> Artificial Sequence 

15 

<220> 

<223> An oligonucleotide 
<400> 291 

2 0tgaggcccag tgatcatgcg tttgcgttgc tcggggtcgt 

<210> 292 
<211> 20 
<212> DNA 
25<213> Artificial Sequence 

<220> 

<223> An oligonucleotide 

30<400> 292 

aqaccttgga agccatggtt 

<210> 293 
<211> 10 
35<212> DNA 

<213> Artificial Sequence 

<220> 

<223> A Kozak sequence 
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<400> 293 
aaccatggct 

<210> 294 
5<211> 12 
<212> DNA 

<213> Artificial Sequence 
<220> 

10<223> An oligonucleotide 

<400> 294 
taattctaga gc 

15<210> 295 
<211> 32 
<212> DNA 

<213> Artificial Sequence 

20<220> 

<223> A primer 

<400> 295 

gcgtagccat ggtaaagcgt gagaaaaatg tc 

25 

<210> 296 
<211> 33 
<212> DNA 

<213> Artificial Sequence 

30 

<220> 

<223> A primer 
<400> 296 

35ccgactctag attactaacc gccggccttc acc 

<210> 297 
<211> 1626 
<212> DNA 
40<213> Artificial Sequence 



PCT/US01/26566 



10 



12 



32 
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<220> 

<223> Sequence of a synthetic luciferase 



<400> 297 

Satggtgaaac gcgaaaagaa cgtgatctac ggcccagaac cactgcatcc actggaagac 60 

ctcaccgctg gtgagatgct cttccgagca ctgcgtaaac atagtcacct ccctcaagca 120 

ctcgtggacg tcgtgggaga cgagagcctc tcctacaaag aatttttcga agctactgtg 180 

ctgttggccc aaagcctcca taattgtggg tacaaaatga acgatgtggt gagcatttgt 240 

gctgagaata acactcgctt ctttattcct gtaatcgctg cttggtacat cggcatgatt 300 

lOgtcgcccctg tgaatgaatc ttacatccca gatgagctgt gtaaggttat gggtattagc 360 

aaacctcaaa tcgtctttac taccaaaaac atcttgaata aggtcttgga agtccagtct 420 

cgtactaact tcatcaaacg catcattatt ctggataccg tcgaaaacat ccacggctgt 480 

gagagcctcc ctaacttcat ctctcgttac agcgatggta atatcgctaa tttcaagccc 540 

ttgcattttg atccagtcga gcaagtggcc gctattttgt gctcctccgg caccactggt 600 

ISttgcctaaag gtgtcatgca gactcaccag aatatctgtg tgcgtttgat ccacgctctc 660 

gaccctcgtg tgggtactca attgatccct ggcgtgactg tgctggtgta tctgcctttc 72 0 

tttcacgcct ttggtttctc tattaccctg ggctatttca tggtcggctt gcgtgtcatc 780 

atgtttcgtc gcttcgacca agaagccttc ttgaaggcta ttcaagacta cgaggtgcgt 840 

tccgtgatca acgtcccttc agtcattttg ttcctgagca aatctccttt ggttgacaag 900 

20tatgatctga gcagcttgcg tgagctgtgc tgtggcgctg ctcctttggc caaagaagtg 960 

gccgaggtcg ctgctaagcg tctgaacctc cctggtatcc gctgcggttt tggtttgact 1020 

gagagcactt ctgctaacat ccatagcttg cgagacgagt ttaagtctgg tagcctgggt 1080 

cgcgtgactc ctcttatggc tgcaaagatc gccgaccgtg agaccggcaa agcactgggc 1140 

ccaaatcaag tcggtgaatt gtgtattaag ggccctatgg tctctaaagg ctacgtgaac 1200 

25aatgtggagg ccactaaaga agccattgat gatgatggct ggctccatag cggcgacttc 1260 

ggttactatg atgaggacga acacttctat gtggtcgatc gctacaaaga attgattaag 132 0 

tacaaaggct ctcaagtcgc accagccgaa ctggaagaaa ttttgctgaa gaacccttgt 1380 

atccgcgacg tggccgtcgt gggtatccca gacttggaag ctggcgagtt gcctagcgcc 1440 

tttgtggtga aacaacccgg caaggagatc actgctaagg aggtctacga ctatttggcc 1500 

30gagcgcgtgt ctcacaccaa atatctgcgt ggcggcgtcc gcttcgtcga ttctattcca 1560 

cgcaacgtta ccggtaagat cactcgtaaa gagttgctga agcaactcct cgaaaaagct 1620 

ggcggc 1626 

<210> 298 
35<211> 542 
<212> PRT 

<213> Artificial Sequence 



<220> 

40<223> Sequence of a synthetic luciferase 
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<400> 298 

Met Val Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 

1 5 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
5 20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
50 55 60 

lOSer Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser He Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe He Pro Val He Ala Ala Trp Tyr 

85 90 95 

He Gly Met He Val Ala Pro Val Asn Glu Ser Tyr He Pro Asp Glu 
15 100 105 110 

Leu Cys Lys Val Met Gly He Ser Lys Pro Gin He Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
130 135 140 

20Ile Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
145 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 

165 170 175 

Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 
25 180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Val 
210 215 220 

3 0Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe Ser He Thr Leu Gly Tyr Phe Met Val Gly 

245 250 255 

Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
35 260 265 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 
290 295 300 

40Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
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305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly lie Arg -Cys Gly 

325 330 335 

Phe Gly Leu Thr Glu Ser Thr Ser Ala Asn lie His Ser Leu Arg Asp 
5 340 345 350 

Glu Phe Lys Ser Gly Ser Leu Gly Arg Val Thr Pro Leu Met Ala Ala 

355 360 365 

Lys lie Ala Asp Arg Glu Thr Gly Lys Ala Leu Gly Pro Asn Gin Val 
370 375 380 

lOGly Glu Leu Cys lie Lys Gly Pro Met Val Ser Lys Gly Tyr Val Asn 
385 390 395 400 

Asn Val Glu Ala Thr Lys Glu Ala lie Asp Asp Asp Gly Trp Leu His 

405 410 415 

Ser Gly Asp Phe Gly Tyr Tyr Asp Glu Asp Glu His Phe Tyr Val Val 
15 420 425 430 

Asp Arg Tyr Lys Glu Leu He Lys Tyr Lys Gly Ser Gin Val Ala Pro 

435 440 445 

Ala Glu Leu Glu Glu He Leu Leu Lys Asn Pro Cys He Arg Asp Val 
450 455 460 

2 0Ala Val Val Gly He Pro Asp Leu Glu Ala Gly Glu Leu Pro Ser Ala 
465 470 475 480 

Phe Val Val Lys Gin Pro Gly Lys Glu He Thr Ala Lys Glu Val Tyr 

485 490 495 

Asp Tyr Leu Ala Glu Arg Val Ser His Thr Lys Tyr Leu Arg Gly Gly 
25 500 505 510 

Val Arg Phe Val Asp Ser He Pro Arg Asn Val Thr Gly Lys lie Thr 

515 520 525 

Arg Lys Glu Leu Leu Lys Gin Leu Leu Glu Lys Ala Gly Gly 
530 535 540 

30 

<210> 299 
<211> 1626 
<212> DNA 

<213> Artificial Sequence 

35 

<220> 

<223> Sequence of a synthetic lucif erase 



<400> 299 

40atggtgaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 
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ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actctcattt gcctcaagcc 120 

ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 180 

ttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 24 0 

gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 300 

Sgtggctccag tcaacgagag ctacattccc gacgaactgt gtaaagtcat gggtatctct 360 

aagccacaga ttgtcttcac cactaagaat attctgaaca aagtcctgga agtccaaagc 42 0 

cgcaccaact ttattaagcg tatcatcatc ttggacactg tggagaatat tcacggttgc 480 

gaatctttgc ctaatttcat ctctcgctat tcagacggca acatcgcaaa ctttaaacca 540 

ctccacttcg accctgtgcja acaagttgca gccattctgt gtagcagcgg tactactgga 600 

lOctcccaaagg gagtcatgca gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 660 

gatccacgct acggcactca gctgattcct ggtgtcaccg tcttggtcta cttgcctttc 72 0 

ttccatgctt tcggctttca tattactttg ggttacttta tggtcggtct ccgcgtgatt 780 

atgttccgcc gttttgatca ggaggctttc ttgaaagcca tccaagatta tgaagtccgc 84 0 

agtgtcatca acgtgcctag cgtgatcctg tttttgtcta agagcccact cgtggacaag 900 

IStacgacttgt cttcactgcg tgaattgtgt tgcggtgccg ctccactggc taaggaggtc 960 

gctgaagtgg ccgccaaacg cttgaatctt ccagggattc gttgtggctt cggcctcacc 102 0 

gaatctacca gcgctattat tcagtctctc cgcgatgagt ttaagagcgg ctctttgggc 1080 

cgtgtcactc cactcatggc tgctaagatc gctgatcgcg aaactggtaa ggctttgggc 114 0 

ccgaaccaag tgggcgagct gtgtatcaaa ggccctatgg tgagcaaggg ttatgtcaat 1200 

2 0aacgttgaag ctaccaagga ggccatcgac gacgacggct ggttgcattc tggtgatttt 1260 

ggatattacg acgaagatga gcatttttac gtcgtggatc gttacaagga gctgatcaaa 1320 

tacaagggta gccaggttgc tccagctgag ttggaggaga ttctgttgaa aaatccatgc 13 80 

attcgcgatg tcgctgtggt cggcattcct gatctggagg ccggcgaact gccttctgct 144 0 

ttcgttgtca agcagcctgg taaagaaatt accgccaaag aagtgtatga ttacctggct 1500 

2 5gaacgtgtga gccatactaa gtacttgcgt ggcggcgtgc gttttgttga ctccatccct 1560 

cgtaacgtaa caggcaaaat tacccgcaag gagctgttga aacaattgtt ggagaaggcc 162 0 

ggcggt 162 6 

<210> 300 
30<211> 542 
<212> PRT 

<213> Artificial Sequence 
<220> 

35<223> Sequence of a synthetic luciferase 



<400> 300 

Met Val Lys Arg Glu Lys Asn Val He Tyr Gly Pro Glu Pro Leu His 
15 10 15 

4 0Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 
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20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 

35 40 45 

Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
5 50 55 60 

Ser Leu His Asn Cys Gly Tyr Lys Met Asn Asp Val Val Ser lie Cys 
65 70 75 80 

Ala Glu Asn Asn Thr Arg Phe Phe lie Pro Val lie Ala Ala Trp Tyr 
85 90 95 

lOIle Gly Met lie Val Ala Pro Val Asn Glu Ser Tyr lie Pro Asp Glu 
100 105 110 

Leu Cys Lys Val Met Gly lie Ser Lys Pro Gin lie Val Phe Thr Thr 

115 120 125 

Lys Asn He Leu Asn Lys Val Leu Glu Val Gin Ser Arg Thr Asn Phe 
15 130 135 140 

He Lys Arg He He He Leu Asp Thr Val Glu Asn He His Gly Cys 
1,45 150 155 160 

Glu Ser Leu Pro Asn Phe He Ser Arg Tyr Ser Asp Gly Asn He Ala 
165 170 175 

20Asn Phe Lys Pro Leu His Phe Asp Pro Val Glu Gin Val Ala Ala He 
180 185 190 

Leu Cys Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Met Gin Thr 

195 200 205 

His Gin Asn He Cys Val Arg Leu He His Ala Leu Asp Pro Arg Tyr 
25 210 215 220 

Gly Thr Gin Leu He Pro Gly Val Thr Val Leu Val Tyr Leu Pro Phe 
225 230 235 240 

Phe His Ala Phe Gly Phe His He Thr Leu Gly Tyr Phe Met Val Gly 
245 250 255 

3 0Leu Arg Val He Met Phe Arg Arg Phe Asp Gin Glu Ala Phe Leu Lys 
260 265 C 270 

Ala He Gin Asp Tyr Glu Val Arg Ser Val He Asn Val Pro Ser Val 

275 280 285 

He Leu Phe Leu Ser Lys Ser Pro Leu Val Asp Lys Tyr Asp Leu Ser 
35 290 295 300 

Ser Leu Arg Glu Leu Cys Cys Gly Ala Ala Pro Leu Ala Lys Glu Val 
305 310 315 320 

Ala Glu Val Ala Ala Lys Arg Leu Asn Leu Pro Gly He Arg Cys Gly 
325 330 335 

40Phe Gly Leu Thr Glu Ser Thr Ser Ala He He Gin Ser Leu Arg Asp 
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Glu Phe Lys Ser 
355 

Lys lie Ala Asp 
5 370 
Gly Glu Leu Cys 
385 

Asn Val Glu Ala 

lOSer Gly Asp Phe 
420 

Asp Arg Tyr Lys 
435 

Ala Glu Leu Glu 
15 450 

Ala Val Val Gly 
465 

Phe Val Val Lys 

20Asp Tyr Leu Ala 
500 

Val Arg Phe Val 
515 

Arg Lys Glu Leu 
25 530 



Gly Ser Leu Gly 
360 

Arg Glu Thr Gly 
375 

lie Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu lie Lys 
440 

Glu He Leu Leu 
455 

He Pro Asp Leu 
470 

Gin Pro Gly Lys 
485 

Glu Arg Val Ser 

Asp Ser He Pro 
520 

Leu Lys Gin Leu 
535 



135 

345 

Arg Val Thr Pro 

Lys Ala Leu Gly 
380 

Met Val Ser Lys 
395 

He Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu He Thr Ala 
490 

His Thr Lys Tyr 
505 

Arg Asn Val Thr 

Leu Glu Lys Ala 
540 



350 

Leu Met Ala Ala 
365 

Pro Asn Gin Val 

Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

He Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 
510 

Gly Lys He Thr 
525 

Gly Gly 



<210> 301 
<211> 1626 
<212> DNA 
30<213> Artificial Sequence 



<220> 

<223> Sequence of a synthetic luciferase 



35<400> 301 

atggtaaagc gtgagaaaaa tgtcatctat ggccctgagc ctctccatcc tttggaggat 60 

ttgactgccg gcgaaatgct gtttcgtgct ctccgcaagc actctcattt gcctcaagcc 120 

ttggtcgatg tggtcggcga tgaatctttg agctacaagg agttttttga ggcaaccgtc 180 

ttgctggctc agtccctcca caattgtggc tacaagatga acgacgtcgt tagtatctgt 240 

40gctgaaaaca atacccgttt cttcattcca gtcatcgccg catggtatat cggtatgatc 300 
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gtggctccag 


tcaacgagag 


ctacattccc 


gacgaactgt gtaaagtcat 


gggtatctct 


360 


aagccacaga 


ttgtcttcac 


cactaagaat attctgaaca aagtcctgga agtccaaagc 


420 


cgcaccaact 


ttattaagcg 


tatcatcatc 


ttggacactg tggagaatat 


tcacggttgc 


480 


gaatctttgc 


ctaatttcat 


ctctcgctat 


tcagacggca acatcgcaaa 


ctttaaacca 


540 


Sctccacttcg 


accctgtgga 


acaagttgca 


gccattctgt gtagcagcgg 


tactactgga 


600 


ctcccaaagg 


gagtcatgca 


gacccatcaa aacatttgcg tgcgtctgat ccatgctctc 


660 


gatccacgct 


acggcactca 


gctgattcct 


ggtgtcaccg tcttggtcta 


cttgcctttc 


720 


ttccatgctt 


tcggctttca 


tattactttg ggttacttta tggtcggtct 


ccgcgtgatt 


780 


atgttccgcc 


gttttgatca 


qqaqqctttc 


ttgaaagcca tccaagatta 


tgaagtccgc 


840 


lOagtgtcatca 


acgtgcctag 


cgtgatcctg 


tttttgtcta agagcccact 


cgtggacaag 


900 


tacgacttgt 


cttcactgcg 


tgaattgtgt 


tgcggtgccg ctccactggc 


taaggaggtc 


960 


gctgaagtgg 


ccgccaaacg 


cttgaatctt 


ccagggattc gttgtggctt 


cggcctcacc 


1020 


gaatctacca 


gtgcgattat 


ccagactctc 


ggggatgagt ttaagagcgg 


ctctttgggc 


1080 


cgtgtcactc 


cactcatggc 


tgctaagatc 


gctgatcgcg aaactggtaa ggctttgggc 


1140 


ISccgaaccaag 


tgggcgagct 


gtgtatcaaa 


ggccctatgg tgagcaaggg 


ttatgtcaat 


1200 


aacgttgaag 


ctaccaagga 


ggccatcgac 


gacgacggct ggttgcattc 


tggtgatttt 


1260 


aaahattaca 


acaaaaataa 


gcatttttac 


gtcgtggatc gttacaagga gctgatcaaa 


1320 


tacaagggta 


gccaggttgc 


tccagctgag 


ttggaggaga ttctgttgaa 


aaatccatgc 


1380 


attcgcgatg 


tcgctgtggt 


cggcattcct 


gatctggagg ccggcgaact 


gccttctgct 


1440 


20ttcgttgtca 


agcagcctgg 


tacagaaatt 


accgccaaag aagtgtatga 


ttacctggct 


1500 


gaacgtgtga 


gccatactaa 


gtacttgcgt 


ggcggcgtgc gttttgttga 


ctccatccct 


1560 


cgtaacgtaa 


caggcaaaat 


tacccgcaag 


gagctgttga* aacaattgtt 


ggtgaaggcc 


1620 


ggcggt 










1626 



25<210> 302 
<211> 542 
<212> PRT 

<213> Artificial Sequence 



30<220> 

<22 3> Sequence of a synthetic lucif erase 
<400> 302 

Met Val Lys Arg Glu Lys Asn Val lie Tyr Gly Pro Glu Pro Leu His 
35 1 5 10 15 

Pro Leu Glu Asp Leu Thr Ala Gly Glu Met Leu Phe Arg Ala Leu Arg 

20 25 30 

Lys His Ser His Leu Pro Gin Ala Leu Val Asp Val Val Gly Asp Glu 
35 40 45 

4 0Ser Leu Ser Tyr Lys Glu Phe Phe Glu Ala Thr Val Leu Leu Ala Gin 
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Ser 

65 

Ala 

5 

He 

Leu 

lOLys 

He 
145 
Glu 

15 

Asn 

Leu 

20His 

Gly 
225 
Phe 

25 

Leu 

Ala 

30Ile 

Ser 
305 
Ala 

35 

Phe 
Glu 
40Lys 



50 

Leu His Asn 

Glu Asn Asn 

Gly Met He 
100 

Cys Lys Val 
115 

Asn He Leu 
130 

Lys Arg He 

Ser Leu Pro 

Phe Lys Pro 
180 

Cys Ser Ser 
195 

Gin Asn He 
210 

Thr Gin Leu 

His Ala Phe 

Arg Val He 
260 

He Gin Asp 
275 

Leu Phe Leu 
290 

Leu Arg Glu 

Glu Val Ala 

Gly Leu Thr 
340 

Phe Lys Ser 
355 

He Ala Asp 



55 

Cys Gly Tyr 
70 

Thr Arg Phe 
85 

Val Ala Pro 

Met Gly He 

Asn Lys Val 
135 

He He Leu 
150 

Asn Phe He 
165 

Leu His Phe 

Gly Thr Thr 

Cys Val Arg 
215 

He Pro Gly 

230 
Gly Phe His 
245 

Met Phe Arg 

Tyr Glu Val 

Ser Lys Ser 
295 

Leu Cys Cys 

310 
Ala Lys Arg 
325 

Glu Ser Thr 
Gly Ser Leu 
Arg Glu Thr 



137 

Lys Met Asn 

Phe He Pro 
90 

Val Asn Glu 
105 

Ser Lys Pro 
120 

Leu Glu Val 

Asp Thr Val 

Ser Arg Tyr 
170 

Asp Pro Val 
185 

Gly Leu Pro 
200 

Leu He His 

Val Thr Val 

He Thr Leu 
250 

Arg Phe Asp 

265 
Arg Ser Val 
280 

Pro Leu Val 

Gly Ala Ala 

Leu Asn Leu 
330 

Ser Ala He 

345 
Gly Arg Val 
360 

Gly Lys Ala 



60 

Asp Val Val 
75 

Val He Ala 

Ser Tyr He 

Gin He Val 
125 

Gin Ser Arg 

140 
Glu Asn He 
155 

Ser Asp Gly 

Glu Gin Val 

Lys Gly Val 
205 

Ala Leu Asp 

220 
Leu Val Tyr 
235 

Gly Tyr Phe 

Gin Glu Ala 

He Asn Val 
285 

Asp Lys Tyr 

300 
Pro Leu Ala 
315 

Pro Gly He 

He Gin Thr 

Thr Pro Leu 
365 

Leu Gly Pro 



Ser He Cys 
80 

Ala Trp Tyr 
95 

Pro Asp Glu 
110 

Phe Thr Thr 

Thr Asn Phe 

His Gly Cys 
160 

Asn He Ala 

175 
Ala Ala He 
190 

Met Gin Thr 

Pro Arg Tyr 

Leu Pro Phe 
240 

Met Val Gly 

255 
Phe Leu Lys 
270 

Pro Ser Val 

Asp Leu Ser 

Lys Glu Val 
320 

Arg Cys Gly 

335 
Leu Gly Asp 
350 

Met Ala Ala 
Asn Gin Val 



WO 02/16944 



y PCTAJS01/26566 



370 

Gly Glu Leu Cys 
385 

Asn Val Glu Ala 

5 

Ser Gly Asp Phe 
420 

Asp Arg Tyr Lys 
435 

lOAla Glu Leu Glu 
450 

Ala Val Val Gly 
465 

Phe Val Val Lys 

15 

Asp Tyr Leu Ala 
500 

Val Arg Phe Val 
515 

20 Arg Lys Glu Leu 
530 



375 

lie Lys Gly Pro 
390 

Thr Lys Glu Ala 
405 

Gly Tyr Tyr Asp 

Glu Leu lie Lys 
440 

Glu lie Leu Leu 
455 

lie Pro Asp Leu 
470 

Gin Pro Gly Thr 
485 

Glu Arg Val Ser 

Asp Ser lie Pro 
520 

Leu Lys Gin Leu 
535 



138 

380 

Met Val Ser Lys 
395 

lie Asp Asp Asp 
410 

Glu Asp Glu His 
425 

Tyr Lys Gly Ser 

Lys Asn Pro Cys 
460 

Glu Ala Gly Glu 
475 

Glu lie Thr Ala 
490 

His Thr Lys Tyr 
505 

Arg Asn Val Thr 

Leu Val Lys Ala 
540 



Gly Tyr Val Asn 
400 

Gly Trp Leu His 
415 

Phe Tyr Val Val 
430 

Gin Val Ala Pro 
445 

lie Arg Asp Val 

Leu Pro Ser Ala 
480 

Lys Glu Val Tyr 
495 

Leu Arg Gly Gly 
510 

Gly Lys lie Thr 
525 

Gly Gly 



